UKOUG RAC & HA SIG 02/10/08

Thankfully I had my excellent colleague, Arjan Van Der Meer, to drive me up to the heritage motor center on a bright sunny day.

It was quite a poor turnout really, down to 43. Probably the lowest RAC & HA SIG turnout, and much less than the last time we were at this venue.

I did not have my laptop for the first section, which means I failed to fully get down the results of the usual survey of who was using what, and the support update. There were no RAC on Windows users. Almost equal numbers of HP-UX and Solaris users, but as always these days it was Linux that was in the majority.

A good number using ASM but a few on OCFS as well.

What ASM can Do For You – Me

I was really happy with my presentation, I’m sure I presented more fluently than previously at the UNIX SIG. This time had setup the presenters tools in powerpoint so I could see bullets for each slide as reminders. I will do this for all future presentations – it made a huge difference, instead of trying to remember everything for every slide!

Of course there were no bullets on the slides

Excellent number of questions were asked at the end.

You can actually access the presentation from the link in the title above. Watch out though – it’s 45MB!

Exadata Database Machine – John Nangle

John Nangle gave a presentation on the Exadata Storage Server and the HP-Oracle Database machine, one throw away comment he made at the beginning seemed to indicate it was just hp for now indicating there may be more hardware vendors in future – though I could have read too much into that one.

Big push on the data warehousing angle. Run through of the speeds and feeds.

Interesting he talked about the IORM – the I/O Resource Manager which allows different databases accesing the exadata to be given differing priority for I/O. Can also be done on a per user basis as well.

slide on offloading the processing of query predicates down to the storage thereby reducing the ammount of traffic/disk blocks sent back to the database server. This is reffered to as smart scans and are transparent to the application.

have to move to Oracle 11.1.0.7 to utilise this.

Explained the concept of cell disks & grid disks.

cell disks are logically partitioned into grid disks – this allows you to partition the disks in the storage server, e.g. outer fast part & inner slow part.

database can co-exist on both exadata and no exadata so some tablespaces on exadata and some tablespaces on non-exadata.

Little bit of marketing with the usual telco beta customers.

Again  John mentioned that there may be more vendors – that seems quite surprising to me! I understand wanting to support more than Linux, but different hardware vendors – i’d have thought hp would have got the exclusive deal.

Interestingly john claimed a year ago, it was intended to ship via more than 1 vendor.

Julian asks a question about upgrading the exadata software itself, john was saying this has not thought out yet.

oooh, the explain plan should table access storage full if it’s benefited from the smart scan interesting that this is visible.

Oracle 10.2.0.4 – The Patch that ate my cluster – David Burnham

This is a war story about 10.2.0.4.

The customer Dave was working was encouraged(forced) to upgrade to 10.2.0.4 due to a sql execution plan bug. this was on a 4 node RAC cluster.

After the 10.2.0.4 there was cluster instability with regular node eviction.

On the evicted node there was almost no diagnostic information – this is classic oprocd.

Oprocd has been around on other UNIX platforms for a while, but only appeared on Linux on 10.2.0.4

Oprocd manages node evictions

OPROCD is much more agressive than the linux kernel hangcheck timer.

changed the diagwait 13 within clusterware  – has to be down.

however changing the diagwait changes the parameters that oprocd runs with.

why did the default settings not work? Dave was running on RedHat 2.4 with a netapp nfs filer. Oracle home and clusterware and data on the filer.

dave was blaming network interrupts. oprocd runs in real time priority, but it wont get priority over network interrupts.

metalink note 567730.1

really good discussion afterwards, I pointed out it happened to me with RedHat 4 & SAN and lots of people had stated they had seen this as well.

RAC Certification Exam Options – Joel Goodman

About half attendees are OCP certified.

There are various levels of certification:

OCA
OCP
OCE
OCM

Barely anyone in the room have the Oracle clusterware managed by SA team – all done dbas, that would not be the case if it was Veritas “clusterware”

No OCE or OCM’s in the room.

Basically no one in the room using Oracle Enterprise Linux.

Joel criticised the early exams, (7, 8) which were farmed out to a 3rd party company to produce them from the oracle course docs.

Joel has written about 25% of the 10g RAC OCE exam.

Experimental simulation questions on Enterprise Manager.

OCM exams are much more practical based, over two days, but with access to the documentation.

Creation of a 12 node stretched cluster – Jason Hughes

Deployment history

First cluster Q1 2003 on 9i

90 clusters

stretched clusters done in 2005

10 stretched clusters

graph of the number of dbs deployed over time. goes up to 500 dbs

mission

wanted to control hardware growth and cost

lots of idle hardware

wanted better uptime

wow 2 x 10g links for the stretched cluster which are seperated by about 10K

using symobilic links on to the powerpath device rather than have to directly access the powerpath device.

they initially did not have 3 voting disks, just 1 voting disk. and they lost one site that contained the voting disk.

complaining that oracle support could not confirm what to use for 3rd voting disk.

RACPACK came to the rescue and made basic nfs servers rather than just netapp supported for the voting disk.

ASM metadata corruption meant ASM refused to start, very little diagnostic information. Jason said the corruption did not occur on RedHat 4.

had an 8-9 hours outage while the dd’d the disks and rebuild the databases. Called in Oracle support.

Oracle employee said that no one was using failure groups and their management told them to remove as it was not standard industry architecture.

management not happy with all databases being dependent on ASM

now using SRDF for mirroring the data across site.

running on 10.2

he created multiple ASM instances as the business did not want one ASM instances being able to bring down the entire db estate.

Intersting, if they thought it was say veritas file system hosting multiple db’s, would they be so concerned?

seem to not really have the mindset that ASM is a filesystem/volume manager focussing much more on the fact it needs an instance to run.

Created a backup VIP

mentioned a bug about restoring voting disks being quite larger than when you backed them up. metalink note 399482.1

Monitoring difficult. e.g. how do you now if you have lost a disk but your failure group is handling the problem.

lost 3rd site with the voting disk but the primary sight had other dependencies (NIS) on the 3rd site.

default environment is a 2 node rac cluster for any part of the business requesting a database.

Interesting discussion where BA where testing stretched cluster at the same time as RBS, but BA did far, far more testing. Almost seems like RBS were ahead of time, deploying on 10gR1.

I’m afraid I skipped the last talk on Active Dataguard – I had seen it before. All in all though another interesting UKOUG eventJ

Advertisements

One thought on “UKOUG RAC & HA SIG 02/10/08

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s