UKOUG RAC & HA SIG February 2010

Introduction

To Oracle UK HQ in Reading for the first RAC & HA SIG of 2010. I’ve been going to this SIG for quite a number of years, and one of the fantastic things about it is seeing some of the regular faces once more.

During the survey as usual vast majority running 2 nodes, with only a sprinkling on more than that, one guy running with 6 nodes though.

Barely anybody running RAC on windows. Quite a few on SPARC. Linux, obviously in the majority.

Vast majority on 10.2 just one on 11.2. Most people running on Fibre Channel, and vast majority on ASM. A handful using Veritas. As always, a large number of home grown apps running against the RAC instances. Lots of dataguard users.

Oracle Support Update – Phil Davies

Terminal of 10gR2 will be 10.2.0.5, which may appear around Easter. Premier support ends 31-July-2010 but there is Free Extended support for 1 year.

Crash Recovery Without Restoring – Arnoud Roth

Restore usually takes 150% of the time it takes to backup your database. This can be time consuming.

Could use standard rman functionality, or you could not restore at all.

You can switch to a backup copy, this is not restoring. This is a quick solution. Even when the database server is beyond repair – need 2 servers and a shared filesystem.

This is using Flash Recovery Area, RMAN, and a shared filesystem. This technique requires two servers with access to the same filesystems, both DB and FRA. You only ever have 1 server accessing the db. On crash of first system the DB and FRA are effectively switched on the second system, the datafiles are now running from the FRA.

The FRA has to be sized to hold a copy of the database + incremental backups. Second system should obviously be identical to first server apart from DB_FILE_CREATE_DEST and DB_FILE_RECOVERY_DEST

Also suggests changing control_files parameter

backup scripts may need adjusting to reflect new FRA location.

Essentially the database is incrementally backed up every night and a copy on the FRA is “rolled forward” with the incremental backup.
Backup current controlfile to FRA – binary copy of controlfile.

Upon Crash – go to failover node

startup mount database
rman target /
switch database to copy;
recover database;
alter database open resetlogs;

You are done!

Ah, Don’t forget to re-create your temporary tablespace!

This is a very clever and cunning technique!

Time to recover depends on the amount of archived redo to apply, and the amount of datafiles that you have to switch to.

He actually has his online redo logs in the FRA so when the other database does recovery it is able to restore right bang up-to-date without dataloss.
After adding datafiles/tablespaces a new incremental backup needs to be taken.

Couple of advantages over dataguard – this will work with Standard Edition, it may not need the second node to be licensed as you are only ever running 1 copy of Oracle.

I had a thought that he might even be able to get away without the switch to copy, if on the 2nd system he actually switches the mount points of the devices – it wil look to the 2nd system that it’s using the original datafiles and they need a recovery.

ASM Split in Extended RAC – Pavel Rabel

Had a situation where they pulled a cable on an extended RAC system, database kept working, put the cable back and the database failed.

I think this was 11gR1

pull the interconnect and a node should get shot leaving just 1 node. This was successful.

Removed connectivity (fibre) between arrays saying he does not know what the expected behaviour is in this scenario. Expecting both nodes to go down.

However when disk arrays lost connectivity to each other both instances stayed up, applications could update data, disk errors in alert logs, and warnings about disks going to be dropped.

using ASM and Fast Mirror Resync

Metalink 466326.1 for ASM Disk Resync. requires compatible[asm, rdbms] both set >= 11.1

disk_repair_time needs to be set > 0. default is 3.6 hours.

seemed to shutdown 1 node then made the arrays visible to each other again.

shutdown 2 node, could no longer startup the database.

is trying to replicate the issue but so far has not managed.

Application Consolidation on a 3 node Cluster – Martin Bach

Martin is a presentation machine – he is practically doing a SIG a month and is always worth listening to.

This was a presentation on a project whereby he took a system running on ancient hardware, and ancient software and moved to a 3 node cluster.

Previous system not providing good enough amount of performance. Monthly release cycle.

moving to 3 node RAC with 4 way 4 core processors. Currently has the system on 10gR2.

Will upgrade new apps to 11gR2, which necessitates an 11gR2 clusterware upgrade.

can move cluster registry to ASM

can move voting disk to ASM with restrictions on numbers of disk depending on redundancy of diskgroup.

must have compatible.asm set to 11.2

Panel Discussion on Consolidation

very interesting discussion on how various people have gone about consolidating databases. Lots of good questions from the audience. BA had interesting story on how they have 25 2 node clusters – thought that perhaps having larger number of nodes in the cluster with fewer actual clusters might have been a better approach.

claims about RAC being useful for increasing I/O bandwidth as multiple servers can have more channels to the storage – though might not increase the bandwidth within the storage array.

This was an excellent part of the day, with the audience providing fantastic insights as well, as a vehicle for tapping the collective wisdom of the audience it worked really well.

I hope this format continues.

Rolling Database Upgrade Real World Experience – Tomas Ramanauskas

Need a logical standby database, Tomas managed to get his downtime down to 15 minutes (could have been much less – 3 mins if he had not had a listener issue)

This can be done from 10.1.0.3, but you must ensure you are only using supported logical standby datatypes. Tomas went from 10.1.0.3 to 10.2.0.4.

had some issues with the logical standby, had to apply bundle patch to 10.2.0.4. Also during the migration they had transactions while they were running against the logical standby that they could not apply and had to skip!

Thinks it was worth the effort for future projects, even though there were lots of issues/bugs when performed on a more complex database. Thinks he will be using this technique in the future, and hopes later versions (particularly with transient logical standby) will make the whole process much easier.

Oracle RAC on Cisco UCS Hardware – Carl Bradshaw

Perhaps, I should have been a bit more open minded approach to these type of presentations, but I have seen a few UCS presentations before and I don’t think I’ll be deploying on them soon. I’m afraid I tuned out for most of this one.

Stretched RAC cluster with Oracle 11gR2 – David Burnham

This sparked a really good discussion right at the end of the day. Dave was having issues with setting up stretched RAC in 11gR2, particularly with the voting disks in ASM, seems like you have much less control over where they are placed in ASM than if you use some other shared filesystem.

Conclusion

All in all it was another excellent RAC & HA SIG, for me if you are based in UK you don’t want to miss these events – you always go away having picked up something.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s