I seem to be spending a small fortune on taxi journeys. Twice now the taxi driver has actually asked me for directions to the destination, I could not believe one of the guys did not know where the Moscone Center was!
I would not want to not know where I was going in SF.
Active-Active Datacenters – Ashish Ray & Lawrence To
This is a bit of a high level overview of various Oracle techniques for distributed active datacenters
definition: independent loosely coupled systems that are kept synchronised
how far apart can sites be:
need to be aware of the network, latency & bandwidth implications
how is data kept in sync:
host based replication either within the db or 3rd party
storage array based mirroring, simpler but drawbacks, propagation of data corruption, less network utilisation
can all db’s be read/write?
discussion on techniques for avoiding conflicts when writing to multiple active db’s ideas include partitioning the data, i.e. emea partitions & apac partitions
how is high availability maintained
need fine grained monitoring. There is a need to measure the latency
how easily can the configuration be managed
RAC extended cluster – better for 25Km or less cache fusion and disk i/o traffic have to traverse the inter-site network so there is additional network latency. Needs to be carefully performance tested. Advantages is both sites can be active and there is no conflicts – it is the same database
Still need dataguard, need to be aware of upgrades/patches. need 3rd site for voting disk.
11g ASM preferred reads facilitate stretch clusters by allowing localised disk reads to only failure groups local to a node. fast disk resync also helps.
Does not provide full HA/DR.
Active dataguard – distance not an issue particularly with ASYNC. works for deploying read only applications. Lawrence thinks dataguard really shines in terms of manageability obviously does provide full HA/DR.
Streams – the real only option for multiple highly distributed active read/write. Allows replication of entire database or just a subset – it is extremely flexible. no real distance limitation tcp/ip for propagation of changes. various options for conflict resolution.
11g package DBMS_COMPARISON to compare tables and merge differences.
managed and configured via Enterprise manager or PL/SQL API’s. performance tuning is key.
Global Scale Web 2.0 – Wei Hu
Sharding is an application managed scaling technique using many, many databases the reason being that a single db can’t cope with the volume of transactions, so subset the data into multiple db’s. It’s then upto the application to route queries to the appropriate db.
shards are replicated, this is the dominant technique for large scale websites. An unamed social network site uses 1800 db’s – he did not say whether they were oracle or mysql! That is horzontal sharding.
Another technique is to have one master and then fan out changes to a read farm the supports reads on a sharded basis.
Very common with mysql heck this is an oracle employee presenting mentioning mysql!
Apparently Oracle has lots of techniques that are useful for sharded db’s
challenges include schema changes, failures corruptions.
claiming that non oracle social networking site has a nightmare with schema changes – they are offline.
schema changes with mysql give 2 choices total outage by making changes to all shards simultaneously, or try shard at a time approach.
Claiming oracle does online schema changes though i’m pretty sure i’ve seen releases causing application failure with Oracle.
shards need to be replicated. He is really going for mysql saying mysql replication is terrible. storage engine and replication state may become inconsistent.
apparently google have made significant changes to mysql for replication and effectively have forked as mysql did not accept back into the codebase the changes.
Now saying Oracle replication has been available since version 7 and that it is highly stable.
Prasing Active Dataguard, but that ain’t gonna help with sharding for writes, but is obviously useful for reader farms but would need all writes to a master. Obviously it’s good for failure as well.
Increasing data volumes lead to a higher probability of data corruption. The ideal sharding solution should detect corrupt data and prevent them from being written. Pushing Oracle dataguard as protection against corruption and lost writes. Flashback allows recovery of data? High performance backup and recovery.
Now bashing mysql over the number of cores it supports, except watch those license fees escalate with the number of cpu’s you have. Of course if you are sharding would you just not just add more nodes and shard finer grained!?!
Talking about scaling to large memory as well.
Best to integrate with mid-tier caching in particular memcached. You must invalidate the mid-tier cache when the data changes so the cache is always in sync with data. With mysql apparently you need database triggers to identify update rows or change mysql to log additional data.
Oracle’s idea is to use LogMiner! This is to directly return primary key of all changed rows from the redo logs to refresh the front end cache – I’m not sure how magic that sounds!
Need to able to allow application changes quickly and safely. Each topic has an example of a website that went tits up (i’m assuming these are not Oracle shops). Advert for Real Application Testing Oracle has better Performance Diagnostics, AWR, ADDM, ASH Oracle is more instrumented than Mysql.
The number of shards and the data volume will increase. Working with lots of anything is more difficult. The ideal arch would allow you to further partition each shard. Slag off of mysql regarding partitioning.
Now mentioning partitioning – but I’m not sure how that helps you shard to a db into 2 db’s – exchange partitions is the answer to move partitions around. Partitioning is transparent to the application.
Monitoring many databases is tough – grid control the answer. Nokia manage 500 dissimiliar databases with 5 dbas
Some quite skeptical questions regarding license fees – if you shard to 50 db’s you are paying 50 oracle licenses that’ll cost a bit more than mysql! Funnily, one of the questions on licensing costs was from an Oracle employee.