OpenWorld 2013

I’ve just come back from my 5th trip to Oracle OpenWorld. While there is something very special about your first trip to OpenWorld, this has been my favourite trip of the lot. In previous years I’ve blogged about what’s been going on every day, but for me, twitter and @jarneil is where I tend to be more active these days, but I thought I’d give a general overview of my OpenWorld experiences this year.

I was quite lucky this year to get a presentation selected “Exadata From Installation to Go Live: Experiences From Banking And Retail”. I was delighted by the number of attendees who turned up to it, well over 150 I reckon. while I have presented many times at various UKOUG events over the year, but I feel it’s slightly different presenting in the US, the attendees were very keen to ask questions and many came up to chat at the end. It certainly makes it all feel worthwhile as a presenter, and I’ll be submitting a presentation for next year for sure.

OakTable World

OakTable World has really become a fixed staple of my OpenWorld experience, a conference within a conference, where some of the most knowledgeable folks in the Oracle community present in a bit more technical detail than you get in the general conference. It really was tremendously well organised by @kylehailey and it was a real privelege to be able to attend this.

Announcements

The big announcement in the Larry Ellison keynote was of course the new Oracle database in memory option. While it sounds like this is a long way from being production ready, it does sound like potentially an awesome feature. It did however make me wonder whether it might start to cannibalize Exadata sales: why worry if you don’t have smart scans when you are querying your hot data in memory.

On the Exadata front, the conference was a bit of damp squib, there was no new hardware announced, apparently due to the timeframe of the recent Intel chip release – it was too late for Oracle. Also there was no new Exadata Storage Server Software announced either. This really did take me by surprise the current version has been around for 9 months or so, and I really expected an update. Now, there were sneak peaks regarding what is coming and there is some good stuff in the offing, in particular compression of data in flashcache, done in hardware on the card, should essentially ensure you get more bang for your flashcache. There is also QOS for networks

To ensure offloading occurs for different versions of databases running on the compute nodes there is going to be a bit of an architectural change with the software running on the cells, with essentially an offloading process for the different database versions. It sounds like this may take a while to appear, and it’s a bit disappointing to say the least, that 12c has been released with no offloading support.

Interesting Presentations

I thought I’d mention a few of my favourite presentations, in addition to the OakTable World stuff above, that I saw. First up is from Maria Colgan and Jonathan Lewis called optimizer bootcamp. It not only was highly educational, but was also superbly entertaining – definitely the outstanding presentation of the week. I also enjoyed “Oracle Exadata: What’s New, What’s Coming” by Juan Loaiza. Juan is an excellent presenter and comes across as highly knowledgeable about the product. Finally I’d like to mention “Solving the Toughest Oracle Exadata Problems by Using Tips from Oracle Development” mostly presented by Scott Heisey. This really did give you a lot of tips on where to look and what to do when something goes wrong with your exadata. If the slides for this become available, I’d highly recommend taking a look if you have to manage an Exadata box.

The Social Side

What truly made this year a memorable experience though, was the social side. After 5 years, I now know quite a few people and really it is superb being able to talk to some of the smartest folks in the Oracle community. It is this that really made it a great year: from the Oracle Ace Dinner and speaking to a whole bunch of Enkitec folks, to the pythian party and one of the best stories I heard at OpenWorld from Karl Arao, to an e-dba dinner and some great stories about Oracle UK from James Anthony and finally the great blogger meetup and speaking with Luca Canali, oh and not forgetting dinner at the Franciscan Crab to the finale with Martin Nash in the W and a couple of mojhito.

It really is the people that makes Oracle OpenWorld a great conference to attend! So many thanks to @e_dba and the Oracle Ace program, without their support, I could not attend.

Cloning Oracle 12c Pluggable Databases

The wait is over and Oracle 12c has finally arrived. Of course there is a lot of interesting and useful looking new features, but one of the eye-catchers is the major architectural shift to multi-tenancy with the Container Database and Pluggable Database features.

To actually create a container database with pluggable databases is really straightforward as shown here by Matthew Walden.  I must say I’m liking the new dbca, it feels just that little bit incrementally nicer to use.

Of course, one of the big ideas behind the multi-tenancy idea is to ramp up the amount of consolidation you can squeeze out of a given server. Using a multi-tenant database you can consolidate more databases onto a given amount of server memory and cpu. I’m doing a lot of that on Exadata at the moment, and I can see that it’s going to be really useful in certain scenarios for this. However, I’m not expecting 12c Exadata support anytime soon!

The other big driver for multi-tenancy is in manageability – the abstraction of user data from the oracle supplied metadata enables a lot of possibilities that did not exist previously. What I wanted to show in this post was how easy it is to create a clone of a pluggable database.

I’ve worked a lot with developers in the past, and one of the things I learned during this time is that developers can’t get enough copies of the database they working on. I don’t think this is a bad thing. In fact, I always took it as read that as the DBA it was my role to actually help the developer develop the best code they could. Certainly makes for less painful releases. So, I was all in favour of giving developers the tools and resources they needed to develop good code, and this very much included copies of the databases they were working on. Testing is a great thing!


[oracle@localhost orcl]$ sq

SQL*Plus: Release 12.1.0.1.0 Production on Wed Jul 3 09:05:10 2013

Copyright (c) 1982, 2013, Oracle.  All rights reserved.


Connected to:
Oracle Database 12c Enterprise Edition Release 12.1.0.1.0 - 64bit Production
With the Partitioning, OLAP, Advanced Analytics and Real Application Testing options

SQL> select CON_ID, NAME, OPEN_MODE from V$PDBS;

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         2 PDB$SEED                       READ ONLY
         3 PDBORCL                        READ ONLY

So I’ve connected to my container database and I’m looking at what pluggable databases exist already. It just happens I have 2 at the moment. We see that both these happen to be read-only. We have the SEED database which is always present in a container database, and we currently have 1 pluggable database PDBORCL. That database just happens to be read only at the moment, but this is handy because:

Currently you can only create a clone of a pluggable database if that pluggable database is read only. This to me seems a bit of a limitation. I expect in a future release this limitation will be lifted.

However, apart from that I can’t emphasise enough how simple and trivial it is to create a clone of a pluggable database. It’s a simple one command:

SQL> create pluggable database PDB1 from PDBORCL FILE_NAME_CONVERT= ('/oradata/orcl/pdborcl', '/oradata/orcl/pdb1');

Pluggable database created.

Elapsed: 00:00:10.67

Simple. This clone took 10 secs. The size of the database that was being cloned was 1.3G. The create pluggable database command has a lot options, but this was the simplest syntax I could see to create a clone of a pluggable database. You need the file_name_convert so that the datafiles from the pluggable database you are cloning can be put somewhere else. And this is something to bear in mind. It really is copying the datafiles from the pluggable database you are cloning. That could be a lot of space required.



SQL> select CON_ID, NAME, OPEN_MODE from V$PDBS;

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         2 PDB$SEED                       READ ONLY
         3 PDBORCL                        READ ONLY
         4 PDB1                           MOUNTED

Elapsed: 00:00:00.00

We can see the pluggable database PDB1 above is created in the mounted state, but it’s easy to get it opened:


SQL> alter pluggable database PDB1 open;

Pluggable database altered.

Elapsed: 00:00:03.88
SQL> select CON_ID, NAME, OPEN_MODE from V$PDBS;

    CON_ID NAME                           OPEN_MODE
---------- ------------------------------ ----------
         2 PDB$SEED                       READ ONLY
         3 PDBORCL                        READ ONLY
         4 PDB1                           READ WRITE

Elapsed: 00:00:00.01

How I can see this being used for developers is you have cloned your production environment onto your test environment. All masking etc being done. You have this copy as your “Master Developer Copy” and make it read only, from this you can create as many copies for your developers as you wish. If a developer makes a mistake it’s easy to drop the pluggable database and create another one:

SQL> drop pluggable database PDB1 including datafiles;
drop pluggable database PDB1 including datafiles
*
ERROR at line 1:
ORA-65025: Pluggable database PDB1 is not closed on all instances.


Elapsed: 00:00:00.05

SQL> alter pluggable database PDB1 close;

Pluggable database altered.

Elapsed: 00:00:00.31

SQL> drop pluggable database PDB1 including datafiles;

Pluggable database dropped.

Elapsed: 00:00:00.15

So we can see we have to have the pluggable database closed before we can drop it. To actually drop it took hardly any time at all, subsecond!

As I eluded to earlier on of the big drawbacks of using this method is that the space requirements can become vast if you start making lots of clones of large databases. What I’ve used in the past is using snapshots to drastically reduce the storage requirements when creating 10’s of copies of the same database. You essentially store 1 copy of the full datafiles and deltas of changes for each of the copies.

It’s great to see there is a snapshot option on the create pluggable database command! However you’ve got to be on the right storage ACFS or Direct NFS. I’m looking forward to testing this soon.

The new multi-tenancy architecture is a major shift for the Oracle database, and this just one of them.

Recovering from rm -rf on a datafile

This blog posting is entirely the credit of Frits Hoogland, who pointed out to me that this was possible.

I managed to delete a datafile, that while it had been created in a completely stupid location, was very much in use. I’m sure we’ve all been there, I mean who hasn’t run rm -rf somewhere they shouldn’t have? When I subsequently checked the database and saw no flashback database, I realised this was going to mean the database being recreated. It was non-production of course, so no real big deal, but still a bit of work none the less. Thanks to Frits’ suggestion though, I could have saved myself some of that work!

First lets create a tablespace, in a really dumb location:

SQL> create tablespace temp_test datafile '/tmp/temp_test01.dbf' size 10M;

Tablespace created.

SQL> conn temp_test/temp_test
Connected.
SQL> create table test_table (key number(1));

Table created.

SQL> insert into test_table values (7);

1 row created.

SQL> commit;

Commit complete.

SQL> select * from test_table;

       KEY
----------
         7


So we have a schema temp_test with a simple table that was created in the temp_test datafile (this was default tablespace of temp_test user). Now we are going to remove this temp_test datafile that is in the /tmp directory:

[oracle@localhost fd]$ rm -rf /tmp/temp_test01.dbf 

It’s really gone. Lets test we can’t read from that table again:

[oracle@localhost fd]$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Tue Apr 23 10:50:23 2013

Copyright (c) 1982, 2011, Oracle.  All rights reserved.


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning option

SQL> conn temp_test/temp_test
Connected.
SQL> select * from test_table;

       KEY
----------
         7

SQL> conn / as sysdba
Connected.
SQL> alter system flush buffer_cache;

System altered.

SQL> conn temp_test/temp_test
Connected.
SQL> select * from test_table;
select * from test_table
*
ERROR at line 1:
ORA-01116: error in opening database file 6
ORA-01110: data file 6: '/tmp/temp_test01.dbf'
ORA-27041: unable to open file
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

For a simple table already buffered, you can still retrieve the data from cache, but as soon as you are forced to attempt a read from disk you are going to error. Oops! That’s not good. So what are the options? flashback database, was my first thought when this happened for real, but that was not available, backups? Well Frits pointed me out a really neat way:

First find a PID of a process we know would have a File Descriptor open for the file we have just removed. Database writer is a good candidate:

[oracle@localhost tmp]$ ps -ef|grep dbw
oracle    3281     1  0 02:19 ?        00:00:04 ora_dbw0_TEST

We could do an lsof on this PID to confirm this is the case:

[oracle@localhost fd]$ /usr/sbin/lsof -p 3281
.
.
.
oracle  3281 oracle  mem    REG  253,0    144776 9919352 /lib64/ld-2.5.so
oracle  3281 oracle  mem    REG  253,0   1722328 9919353 /lib64/libc-2.5.so
oracle  3281 oracle  mem    REG  253,0    615136 9919361 /lib64/libm-2.5.so
oracle  3281 oracle  mem    REG  253,0     22368 8109231 /usr/lib64/libnuma.so.1
oracle  3281 oracle  mem    REG  253,0     23360 9919355 /lib64/libdl-2.5.so
oracle  3281 oracle  mem    REG  253,0    145872 9919356 /lib64/libpthread-2.5.so
oracle  3281 oracle  mem    REG  253,0     53448 9919357 /lib64/librt-2.5.so
oracle  3281 oracle  mem    REG  253,0    114352 9919367 /lib64/libnsl-2.5.so
oracle  3281 oracle  mem    REG  253,0     58949 6620956 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnque11.so
oracle  3281 oracle  mem    CHR    1,5              4172 /dev/zero
oracle  3281 oracle  mem    REG  253,0      1544 6621005 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
oracle  3281 oracle  mem    REG  253,0     53880 9919035 /lib64/libnss_files-2.5.so
oracle  3281 oracle  mem    REG  253,0      3768 8091976 /usr/lib64/libaio.so.1.0.1
oracle  3281 oracle  mem    REG  253,0    153574 6633914 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrutl11.so
oracle  3281 oracle  mem    REG  253,0   3319072 6633913 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocrb11.so
oracle  3281 oracle  mem    REG  253,0   1590995 6633912 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libocr11.so
oracle  3281 oracle  mem    REG  253,0     12755 6618701 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxn2.so
oracle  3281 oracle  mem    REG  253,0  17319952 6633880 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libhasgen11.so
oracle  3281 oracle  mem    REG  253,0    161764 6618402 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libdbcfg11.so
oracle  3281 oracle  mem    REG  253,0    228765 6633882 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libclsra11.so
oracle  3281 oracle  mem    REG  253,0   7955322 6634146 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libnnz11.so
oracle  3281 oracle  mem    REG  253,0   1010297 6620981 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libskgxp11.so
oracle  3281 oracle  mem    REG  253,0    589359 6634326 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libcell11.so
oracle  3281 oracle  mem    REG  253,0     12259 6615927 /u01/app/oracle/product/11.2.0/dbhome_1/lib/libodmd11.so
oracle  3281 oracle    0r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    1w   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    2w   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    3r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    4r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    5u   REG  253,0      1544 6621005 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
oracle  3281 oracle    6r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    7r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    8r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle    9r   CHR    1,3       0t0    4170 /dev/null
oracle  3281 oracle   10r   CHR    1,5       0t0    4172 /dev/zero
oracle  3281 oracle   11r   CHR    1,5       0t0    4172 /dev/zero
oracle  3281 oracle   12u   REG  253,0      1544 6621005 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
oracle  3281 oracle   13r   REG  253,0   1101312 6627555 /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/mesg/oraus.msb
oracle  3281 oracle   14r   DIR    0,3         0   14239 /proc/3281/fd
oracle  3281 oracle   15r   CHR    1,5       0t0    4172 /dev/zero
oracle  3281 oracle   16u   REG  253,0      1544 6621005 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
oracle  3281 oracle   17uR  REG  253,0        24 9919350 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/lkTEST
oracle  3281 oracle   18r   REG  253,0   1101312 6627555 /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/mesg/oraus.msb
oracle  3281 oracle   19u  IPv6  87410       0t0     UDP *:19457 
oracle  3281 oracle  256u   REG  253,0   9748480 6514798 /u01/app/oracle/oradata/TEST/control01.ctl
oracle  3281 oracle  257u   REG  253,0   9748480 6514799 /u01/app/oracle/fast_recovery_area/TEST/control02.ctl
oracle  3281 oracle  258uW  REG  253,0 754982912 6514790 /u01/app/oracle/oradata/TEST/system01.dbf
oracle  3281 oracle  259uW  REG  253,0 629153792 6514791 /u01/app/oracle/oradata/TEST/sysaux01.dbf
oracle  3281 oracle  260uW  REG  253,0 120594432 6514792 /u01/app/oracle/oradata/TEST/undotbs01.dbf
oracle  3281 oracle  261uW  REG  253,0   5251072 6514793 /u01/app/oracle/oradata/TEST/users01.dbf
oracle  3281 oracle  262uW  REG  253,0 362422272 6515122 /u01/app/oracle/oradata/TEST/example01.dbf
oracle  3281 oracle  263uW  REG  253,0  30416896 6514823 /u01/app/oracle/oradata/TEST/temp01.dbf
oracle  3281 oracle  264uW  REG  253,0  10493952 5761580 /tmp/temp_test01.dbf (deleted)

Missed out lots of /dev/shm entries in the above. We can see on the last line, the /tmp/temp_test01.dbf datafile and we see it’s marked as deleted. We can also see it has a file descriptor 264. You can also see this from the following directory:

[oracle@localhost fd]$ ls -ltar /proc/3281/fd/
total 0
dr-xr-xr-x 7 oracle oinstall  0 Mar  6 13:38 ..
dr-x------ 2 oracle oinstall  0 Mar  6 13:38 .
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 9 -> /dev/null
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 8 -> /dev/null
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 7 -> /dev/null
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 6 -> /dev/null
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 5 -> /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 4 -> /dev/null
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 3 -> /dev/null
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 263 -> /u01/app/oracle/oradata/TEST/temp01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 262 -> /u01/app/oracle/oradata/TEST/example01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 261 -> /u01/app/oracle/oradata/TEST/users01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 260 -> /u01/app/oracle/oradata/TEST/undotbs01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 259 -> /u01/app/oracle/oradata/TEST/sysaux01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 258 -> /u01/app/oracle/oradata/TEST/system01.dbf
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 257 -> /u01/app/oracle/fast_recovery_area/TEST/control02.ctl
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 256 -> /u01/app/oracle/oradata/TEST/control01.ctl
l-wx------ 1 oracle oinstall 64 Apr 23 10:45 2 -> /dev/null
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 19 -> socket:[87410]
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 18 -> /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 17 -> /u01/app/oracle/product/11.2.0/dbhome_1/dbs/lkTEST
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 16 -> /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 15 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 14 -> /proc/3281/fd
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 13 -> /u01/app/oracle/product/11.2.0/dbhome_1/rdbms/mesg/oraus.msb
lrwx------ 1 oracle oinstall 64 Apr 23 10:45 12 -> /u01/app/oracle/product/11.2.0/dbhome_1/dbs/hc_TEST.dat
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 11 -> /dev/zero
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 10 -> /dev/zero
l-wx------ 1 oracle oinstall 64 Apr 23 10:45 1 -> /dev/null
lr-x------ 1 oracle oinstall 64 Apr 23 10:45 0 -> /dev/null
lrwx------ 1 oracle oinstall 64 Apr 23 10:48 264 -> /tmp/temp_test01.dbf (deleted)

Where we have used the PID of the dbw process. Again we see the file is marked as deleted and again it’s FD 264. Now this is the idea Frits mentioned and we can start actually recovering the data:

[oracle@localhost fd]$ cat 264 > /tmp/test01.dbf

From the /proc/3281/fd/ directory I ran a cat on the File Descriptor number and sent it to another filename. test01.dbf. Now you have recovered the data, you can switch the tablespace to use this datafile:

SQL> alter database datafile '/tmp/temp_test01.dbf' offline;

Database altered.

SQL> alter database rename file '/tmp/temp_test01.dbf' to '/tmp/test01.dbf';

Database altered.

SQL> alter database datafile '/tmp/test01.dbf' online;
alter database datafile '/tmp/test01.dbf' online
*
ERROR at line 1:
ORA-01113: file 6 needs media recovery
ORA-01110: data file 6: '/tmp/test01.dbf'


SQL> recover datafile '/tmp/test01.dbf';
Media recovery complete.
SQL> alter database datafile '/tmp/test01.dbf' online;

Database altered.

SQL> select * from temp_test.test_table;

       KEY
----------
         7

This is such a cool trick. I still would consider a very last option, particularly in a production environment. In fact, you don’t really want to be creating tablespaces in the /tmp directory or any other stupid location in the first place!

Online patching: The Good, the Bad, and the Ugly

I’ve worked on 24×7 systems for more than a decade, and I have a real dislike of downtime. For one, it can be a real pain to agree any downtime with the business, and while RAC can and does help when you do work in a rolling fashion, there is still risk.

The promise of online patching has been a long one, and it is only recently that I dipped my toe in the water with them. Unfortunately, they are not a panacea, and in this blog posting I’m going to share some of the downsides.

Of course not all patches are online, if they are the README associated with the patch will have an online section in how to apply the patch, also when you uncompress the patch there will be an directory called online.

The Good

So first for the good side, the actual application truly can be done online, in that sense it does what it says on the tin. Here I’m running from the unzipped patch directory, and in this example I’m using patch 10219624:

bash-3.2$ /u01/app/oracle/product/11.2.0/db_1/OPatch/opatch apply online -connectString TESTRAC1 -ocmrf /tmp/ocm.rsp 
Oracle Interim Patch Installer version 11.2.0.3.3
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0/db_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc
OPatch version    : 11.2.0.3.3
OUI version       : 11.2.0.3.0
Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/10219624_Jan_24_2013_08_54_08/apply2013-01-24_08-54-08AM_1.log

Applying interim patch '10219624' to OH '/u01/app/oracle/product/11.2.0/db_1'
Verifying environment and performing prerequisite checks...
All checks passed.
Backing up files...

Patching component oracle.rdbms, 11.2.0.3.0...
Installing and enabling the online patch 'bug10219624.pch', on database 'TESTRAC1'.


Verifying the update...

Patching in all-node mode.

Updating nodes 'rac2' 
   Apply-related files are:
     FP = "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_files.txt"
     DP = "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_dirs.txt"
     MP = "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/make_cmds.txt"
     RC = "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/remote_cmds.txt"

Instantiating the file "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_files.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_files.txt" with actual path.
Propagating files to remote nodes...
Instantiating the file "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_dirs.txt.instantiated" by replacing $ORACLE_HOME in "/u01/app/oracle/product/11.2.0/db_1/.patch_storage/10219624_Dec_20_2012_02_13_54/rac/copy_dirs.txt" with actual path.
Propagating directories to remote nodes...
Patch 10219624 successfully applied
Log file location: /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/10219624_Jan_24_2013_08_54_08/apply2013-01-24_08-54-08AM_1.log

OPatch succeeded.

I’m applying this to a 2 node 11gR2 RAC cluster. You’ll notice that it is applied on ALL nodes. You can’t apply an online patch in RAC to just one node at a time and you can’t rollback one node at a time either. Also be aware that while the patch is in the oracle home on all nodes in the cluster, it’s only been applied to the local instance

Now, I know you are meant to give connection string details like username/password, and can then apply to all instances in a cluster at the same time, but on some systems I work on, I do not have this information, and rely on OS authentication only. This can lead to some pain.

You can tell a patch is applied with the following:

SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug10219624.pch                                  ENABLED

However, on the remote node:

SQL> oradebug patch list

Patch File Name                                   State
================                                =========
No patches currently installed

I accept this need not arise if you are able to authenticate properly at installation time. To fix this up you can do the following:

-bash-3.2$ /u01/app/oracle/product/11.2.0/db_1/OPatch/opatch util enableonlinepatch -connectString TESTRAC2 -id 10219624
Oracle Interim Patch Installer version 11.2.0.3.3
Copyright (c) 2012, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/11.2.0/db_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/11.2.0/db_1/oraInst.loc
OPatch version    : 11.2.0.3.3
OUI version       : 11.2.0.3.0
Log file location : /u01/app/oracle/product/11.2.0/db_1/cfgtoollogs/opatch/opatch2013-01-24_09-47-08AM_1.log

Invoking utility "enableonlinepatch"
Installing and enabling the online patch 'bug10219624.pch', on database 'TESTRAC2'.


OPatch succeeded.

The Bad
I’ve found rolling back to be slightly more problematic on the remote with o/s authentication. The rollback always removed the patch from the home across all nodes and always removed it from the instance on the local node. While there is an opatch method documented to then stop being enabled in an instance, very similar to the enableonlinepatch above (it’s Disableonlinepatch) I found it did not work with some patches, though opatch reported success, the patch was still enabled.

Another point to note, restarting an instance does not remove an online applied patch, there is a directory under the $ORACLE_HOME, called hpatch that has the online applied patch libraries.

To get round this I had to resort to the following oradebug commands:

SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug10219624.pch                                  ENABLED

SQL> oradebug patch disable bug10219624.pch
Statement processed.
SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug10219624.pch                                  DISABLED

SQL> oradebug patch remove bug10219624.pch
Statement processed.
SQL> oradebug patch list

Patch File Name                                   State
================                                =========
bug10219624.pch                                  REMOVED

That oradebug patch list showing removed then reverts to “No patches currently installed” upon instance restart.

The Ugly

This really caught me out, patches applied online are completely incompatible with a subsequent running of opatch auto. I had the situation recently whereby I had applied a patch online and then later wanted to run opatch auto to apply further patches. Before running opatch auto I always run the check for conflicts, and this did not give me a clue that opatch auto would not work with the online applied patch.

However when I ran opatch auto on Bundle Patch 11 the following occurred:

[Jan 16, 2013 9:19:16 AM]    OUI-67303:
                             Patches [   14632268   12880299   13734832 ] will be rolled back.
[Jan 16, 2013 9:19:16 AM]    Do you want to proceed? [y|n]
[Jan 16, 2013 9:19:19 AM]    Y (auto-answered by -silent)
[Jan 16, 2013 9:19:19 AM]    User Responded with: Y
[Jan 16, 2013 9:19:19 AM]    OPatch continues with these patches:   14474780
[Jan 16, 2013 9:19:19 AM]    OUI-67073:UtilSession failed:
                             OPatch cannot roll back an online patch while applying a regular patch.
                             Please rollback the online patch(es) " 14632268" manually, and then apply the regular patch(es) " 14474780".
[Jan 16, 2013 9:19:19 AM]    --------------------------------------------------------------------------------
[Jan 16, 2013 9:19:19 AM]    The following warnings have occurred during OPatch execution:
[Jan 16, 2013 9:19:19 AM]    1) OUI-67303:
                             Patches [   14632268   12880299   13734832 ] will be rolled back.
[Jan 16, 2013 9:19:19 AM]    --------------------------------------------------------------------------------
[Jan 16, 2013 9:19:19 AM]    Finishing UtilSession at Wed Jan 16 09:19:19 GMT 2013
[Jan 16, 2013 9:19:19 AM]    Log file location: /u01/app/ora/product/11.2.0.3/db_1/cfgtoollogs/opatch/opatch2013-01-16_09-19-08AM_1.log
[Jan 16, 2013 9:19:19 AM]    Stack Description: java.lang.RuntimeException:
                             OPatch cannot roll back an online patch while applying a regular patch.
                             Please rollback the online patch(es) " 14632268" manually, and then apply the regular patch(es) " 14474780"

Yes, it’s not that difficult to fix up, the frustrating thing here is the prerequisite checks did not show any issues. It’s pretty clear that the opatch auto developers have not given any thought how to properly handle an online applied patch, or the online patching developers have not considered the consequences of online patching with a future opatch auto.

Online patching is almost like the holy grail, nobody wants downtime, but I just don’t think the current online patching technique is quite fully there yet, and it really doesn’t play at all with opatch auto.

Observing Exadata HCC compression changes when adding columns

This blog posting is very much a follow on from the previous entry on how data compressed with Exadata HCC compression behaves under changing table definitions. Many thanks to Greg Rahn for the comments on the previous blog entry on a simple mechanism for determining whether the compression level has changed or not.

In this blog posting we add a column to an HCC compressed table and we observe whether the number of blocks in the table changes or not.

As Greg stated in the comments on the previous blog entry, we have 3 possibilities for adding a column:

  1. add column
  2. add column with a default value
  3. add column with a default value but also specify as not null

We start with the same table as in the previous entry:

SQL : db01> create table t_part (
username varchar2(30),
user_id number,
created date )
partition by range (created)
(partition p_2009 values less than (to_date('31-DEC-2009', 'dd-MON-YYYY')) tablespace users,
partition p_2010 values less than (to_date('31-DEC-2010', 'dd-MON-YYYY')) tablespace users,
partition p_2011 values less than (to_date('31-DEC-2011', 'dd-MON-YYYY')) tablespace users,
partition p_2012 values less than (to_date('31-DEC-2012', 'dd-MON-YYYY')) tablespace users )

/

Table created.

SQL : db01> alter table t_part compress for query high

/

Table altered.
SQL : db01> insert /*+ APPEND */ into t_part select * from all_users

488 rows created.

SQL : db01> commit;

Commit complete.

So now we gather stats on the table and see how many blocks the table is consuming:

SQL : db01> exec DBMS_STATS.gather_table_stats(ownname => 'SYS',tabname => 'T_PART', estimate_percent => 100);
PL/SQL procedure successfully completed.

SQL : db01> select table_name, blocks, empty_blocks, avg_row_len , last_analyzed from dba_tables where table_name='T_PART';

TABLE_NAME BLOCKS EMPTY_BLOCKS   AVG_ROW_LEN LAST_ANAL
---------- ------ -------------- ---------- ------------
T_PART        60      0          20         18-MAY-12

This will be our starting point for each of the 3 ways of adding a column. We will always start with this table consuming 60 blocks. We will then add the column and then determine how many blocks the table is consuming after the column has been added.

If the table has undergone decompression from HCC compression the number of blocks will go up, conversely if it has not then the number of blocks will remain static.

First we try just adding a column, no default value:

SQL : db01> alter table t_part add city varchar2(30);

Table altered.

SQL : db01> exec DBMS_STATS.gather_table_stats(ownname => 'SYS', tabname => 'T_PART', estimate_percent => 100);

PL/SQL procedure successfully completed.
SQL : db01> select table_name, blocks, empty_blocks, avg_row_len , last_analyzed from dba_tables where table_name='T_PART';

TABLE_NAME BLOCKS EMPTY_BLOCKS AVG_ROW_LEN LAST_ANAL
---------- ------ ----------  ---------- ------------
T_PART        60      0          20         18-MAY-12

So this method has not updated the number of blocks. It’s just a dictionary change. We then drop the table with the purge option and recreate it back to the starting point of 60 blocks. Next we try adding the column with a default value:

SQL : db01> alter table t_part add city varchar2(30) default 'Oxford';
Table altered.
SQL : db01> exec DBMS_STATS.gather_table_stats(ownname => 'SYS', tabname => 'T_PART', estimate_percent => 100);

PL/SQL procedure successfully completed.
SQL : db01>select table_name, blocks, empty_blocks, avg_row_len , last_analyzed from dba_tables where table_name='T_PART';

TABLE_NAME   BLOCKS  EMPTY_BLOCKS  AVG_ROW_LEN  LAST_ANAL
------------ ------ ------------    ---------- -----------
T_PART        192       0             27       18-MAY-12

As we can see the number has absolutely rocketed up from 60 to 192, this is indicative of the fact the data is no longer compressed with HCC compression.

Finally we repeat adding a column with a default value, but this time including the not null condition:


SQL :  db01> alter table t_part add city varchar2(30) default 'Oxford' not null;

Table altered.

SQL :  db01>  exec DBMS_STATS.gather_table_stats(ownname => 'SYS', tabname => 'T_PART', estimate_percent => 100);

PL/SQL procedure successfully completed.
<pre>SQL : db01> select table_name, blocks, empty_blocks, avg_row_len , last_analyzed from dba_tables where table_name='T_PART';

TABLE_NAME BLOCKS EMPTY_BLOCKS   AVG_ROW_LEN LAST_ANAL
---------- ------ -------------- ---------- ------------
T_PART        60      0          20         18-MAY-12

We see that with thetechnique of including a not null clause on the add column with a default value that the number of blocks has not changed, and hence the data must still be HCC compressed, as confirmed with the DBMS_COMPRESSION.GET_COMPRESSION_TYPE procedure.

Essentially if you can have any column that you add to an HCC compressed table to be defined as  not null  then you can be sure that specifying a default value will not undo your HCC compression.

If you do need to allow nulls, then getting away without a default value would be best and perhaps only updating recent data rather than all historical data would at least preserve some data as being HCC compressed. Be aware that uncompressing HCC compressed obviously can lead to a large increase in your storage requirements.

Adding Columns and Exadata HCC compression

While everyone is aware of the issues of mixing EHCC compression and OLTP type activities, I had a customer who was interested in finding out what happens upon adding a column to a table that has EHCC compression enabled on it.

As I could not see any definitive statements in the documentation on this particular scenario I ran up some tests to see the behaviour.

First of all they are using partitioning by date range, so we create a partitioned table:

SQL: db01> create table t_part  ( 
username varchar2(30), 
user_id  number, 
created date ) 
partition by range (created) 
( partition p_2009 values less than (to_date('31-DEC-2009', 'dd-MON-YYYY')) tablespace users, 
partition p_2010 values less than (to_date('31-DEC-2010', 'dd-MON-YYYY')) tablespace users, 
partition p_2011 values less than (to_date('31-DEC-2011', 'dd-MON-YYYY')) tablespace users, 
partition p_2012 values less than (to_date('31-DEC-2012', 'dd-MON-YYYY')) tablespace users )

/

Table created

The customer is particularly interested in using partitioning for ILM type scenarios in that they will compress historical partitions but not more up-to-date ones. Lets enable HCC compression on the table and check that it’s on:


SQL: db01> alter table t_part compress for query high 
/

Table altered

SQL: db01> select table_name, partition_name, compression, compress_for 
from all_tab_partitions 
where table_name='T_PART' 
/

TABLE_NAME                     PARTITION_NAME                 COMPRESS COMPRESS_FOR 
------------------------------ ------------------------------ -------- ------------ 
T_PART                         P_2009                         ENABLED  QUERY HIGH 
T_PART                         P_2010                         ENABLED  QUERY HIGH 
T_PART                         P_2011                         ENABLED  QUERY HIGH 
T_PART                         P_2012                         ENABLED  QUERY HIGH

Lets insert some data and check that the actual row is compressed (thanks to Kerry Osborne)


SQL: db01>; insert /*+ APPEND */ into t_part select * from all_users 
/ 
3008 rows created
SQL: db01> commit
/
Commit complete

SQL: db01> select max(rowid) from t_part
/

MAX(ROWID) 
------------------ 
AAAexSAANAAHGoUAAN

SQL: db01> select decode( 
DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&amp;rowid'), 
    1, 'No Compression', 
    2, 'Basic/OLTP Compression', 
    4, 'HCC Query High', 
    8, 'HCC Query Low', 
   16, 'HCC Archive High', 
   32, 'HCC Archive Low', 
   'Unknown Compression Level') compression_type 
from dual;

Enter value for rowid: AAAexSAANAAHGoUAAN 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAexSAANAAHGoUAAN'),

COMPRESSION_TYPE 
------------------------- 
HCC Query High

So we are confident we have a row that is compressed. Now we add a new column to the table and give it a default value, we then check again what compression the row has:

SQL: db01> alter table t_part add city varchar2(30) default 'Oxford' 
/

Table altered.

select decode( 
DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
  2    3  1, 'No Compression', 
  4  2, 'Basic/OLTP Compression', 
  5  4, 'HCC Query High', 
  6  8, 'HCC Query Low', 
  7  16, 'HCC Archive High', 
32, 'HCC Archive Low', 
    'Unknown Compression Level') compression_type 
from dual; 
Enter value for rowid: AAAexSAANAAHGoUAAN 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAexSAANAAHGoUAAN'),

COMPRESSION_TYPE 
------------------------- 
Basic/OLTP Compression

Oh Dear! Our compression has changed.

This maybe is not that surprising. But what if you have a requirement to add a column but with no default value, and you just want to update more recent records, can we avoid downgrading all records from HCC compression?

So we are using the same table and data as before. We will focus on two rows, one in the 2011 partition and one in the 2012 partition.

SQL: db01> select max(rowid) from t_part where created  > TO_DATE('31-Dec-2010', 'DD-MM-YYYY') and created < TO_DATE('01-Jan-2012', 'DD-MM-YYYY');

MAX(ROWID) 
------------------ 
AAAezbAAHAAFwIKAE/

SQL: db01> select decode( 
DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
    1, 'No Compression', 
    2, 'Basic/OLTP Compression', 
    4, 'HCC Query High', 
    8, 'HCC Query Low', 
    16, 'HCC Archive High', 
    32, 'HCC Archive Low', 
    'Unknown Compression Level') compression_type 
from dual;  
Enter value for rowid: AAAezbAAHAAFwIKAE/ 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAezbAAHAAFwIKAE/'),

COMPRESSION_TYPE 
------------------------- 
HCC Query High

SQL: db01> select max(rowid) from t_part where created  > TO_DATE('31-Dec-2011', 'DD-MM-YYYY') and created < TO_DATE('31-Dec-2012', 'DD-MM-YYYY');

MAX(ROWID) 
------------------ 
AAAezcAAHAAHdoSADf

SQL:xldnc911001hdor:(SMALLDB1):PRIMARY> select decode( 
    DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
    1, 'No Compression', 
    2, 'Basic/OLTP Compression', 
    4, 'HCC Query High', 
    8, 'HCC Query Low', 
    16, 'HCC Archive High', 
    32, 'HCC Archive Low', 
    'Unknown Compression Level') compression_type 
from dual; 
Enter value for rowid: AAAezcAAHAAHdoSADf 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAezcAAHAAHdoSADf'),

COMPRESSION_TYPE 
------------------------- 
HCC Query High

Now we add a column to the table and update the records in only the 2012 partition:

SQL: db01> alter table t_part add city varchar2(30);

Table altered.

SQL: db01> update t_part set city='Oxford' where created > to_date('31-Dec-2011', 'DD-MM-YYYY');

448 rows updated.

SQL: db01> commit;

Commit complete.

And now we again check the compression status of our two rows:

SQL: db01> select decode( 
DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
    1, 'No Compression', 
    2, 'Basic/OLTP Compression', 
    4, 'HCC Query High', 
    8, 'HCC Query Low', 
   16, 'HCC Archive High', 
   32, 'HCC Archive Low', 
       'Unknown Compression Level') compression_type 
from dual;  
Enter value for rowid: AAAezbAAHAAFwIKAE/ 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAezbAAHAAFwIKAE/'),

COMPRESSION_TYPE 
------------------------- 
HCC Query High

SQL: db01> select decode( 
DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
    1, 'No Compression', 
    2, 'Basic/OLTP Compression', 
    4, 'HCC Query High', 
    8, 'HCC Query Low', 
    16, 'HCC Archive High', 
    32, 'HCC Archive Low', 
        'Unknown Compression Level') compression_type 
   from dual; 
Enter value for rowid: AAAezcAAHAAHdoSADf 
old   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', '&rowid'), 
new   2: DBMS_COMPRESSION.GET_COMPRESSION_TYPE ( 'SYS', 'T_PART', 'AAAezcAAHAAHdoSADf'),

COMPRESSION_TYPE 
------------------------- 
Basic/OLTP Compression

So that is great, we have a way of evolving table definitions without having to suffer the whole set of historical data to not be in HCC compression.

Creating ASM diskgroups on Exadata with ASMCA

I recently had the chance to create some diskgroups on an Exadata box outside the standard installation procedure, while this is not necessarily Exadata specific, I thought the technique of using ASMCA silently on the command line to create the diskgroups was sufficiently novel for a short blog posting. If for nothing else but to remind myself on how to do this in future.

This example uses the disks presented from a quarter rack exadata system and creates a diskgroup called DATA01:

asmca -silent -createDiskGroup -diskGroupName 'DATA01' -diskList o/192.168.10.14/DATA01_CD_00_cel01,o/192.168.10.14/DATA01_CD_01_cel01,
o/192.168.10.14/DATA01_CD_02_cel01,o/192.168.10.14/DATA01_CD_03_cel01,
o/192.168.10.14/DATA01_CD_04_cel01,o/192.168.10.14/DATA01_CD_05_cel01,
o/192.168.10.14/DATA01_CD_06_cel01,o/192.168.10.14/DATA01_CD_07_cel01,
o/192.168.10.14/DATA01_CD_08_cel01,o/192.168.10.14/DATA01_CD_09_cel01,
o/192.168.10.14/DATA01_CD_10_cel01,o/192.168.10.14/DATA01_CD_11_cel01,
o/192.168.10.15/DATA01_CD_00_cel02,o/192.168.10.15/DATA01_CD_01_cel02,
o/192.168.10.15/DATA01_CD_02_cel02,o/192.168.10.15/DATA01_CD_03_cel02,
o/192.168.10.15/DATA01_CD_04_cel02,o/192.168.10.15/DATA01_CD_05_cel02,
o/192.168.10.15/DATA01_CD_06_cel02,o/192.168.10.15/DATA01_CD_07_cel02,
o/192.168.10.15/DATA01_CD_08_cel02,o/192.168.10.15/DATA01_CD_09_cel02,
o/192.168.10.15/DATA01_CD_10_cel02,o/192.168.10.15/DATA01_CD_11_cel02,
o/192.168.10.16/DATA01_CD_00_cel03,o/192.168.10.16/DATA01_CD_01_cel03,
o/192.168.10.16/DATA01_CD_02_cel03,o/192.168.10.16/DATA01_CD_03_cel03,
o/192.168.10.16/DATA01_CD_04_cel03,o/192.168.10.16/DATA01_CD_05_cel03,
o/192.168.10.16/DATA01_CD_06_cel03,o/192.168.10.16/DATA01_CD_07_cel03,
o/192.168.10.16/DATA01_CD_08_cel03,o/192.168.10.16/DATA01_CD_09_cel03,
o/192.168.10.16/DATA01_CD_10_cel03,o/192.168.10.16/DATA01_CD_11_cel03 -redundancy NORMAL -compatible.asm 11.2.0.0 -compatible.rdbms 11.2.0.0 -sysAsmPassword welcome1 -silent

Note I’ve edited the above to have returns after every couple of disks for readability. Of course you can specify your required redundancy level and differing compatibile parameters. You can see the 3 cells here that make up an Exadata 1/4 rack. You can also see the Infiniband IP addresses of each of these cells in the path to each griddisk.

You can see however the nice mapping between each of the griddisks names and the storage cells upon which they reside (e.g. DATA01_CD_00_cel01 – DATA01 griddisk created on Celldisk 00 of storage cel01) – I really like this naming feature of Exadata it makes life that little bit more straightforward for the administrator.

Only other thing to be aware of, when creating the DBFS_DG diskgroup all those CD_00, and CD_01 griddisks don’t exist as this is the storage on each cell is used for the systems disks.

UKOUG Exa Day

Had a great day at the UKOUG Exa.. Day yesterday. I was happy by and large with how my presentation went, it was a bit irritating having some technical issues with laptops and projectors, but hopefully the audience was entertained enough to not let that have annoyed them too much. I’ve included a link to the powerpoint of the presentation, you really need to read the notes to gain an understanding of what I was trying to say!

Apologies that it is a near 7MB download, but I’d be happy to answer any questions you may have on it, and all feedback welcomed.

As for the day itself, there was a great atmosphere at the event, and many, many familiar faces from UKOUG events of the past, in particular many faces from the RAC SIG. I particularly enjoyed Frits Hoogland’s presentation on Exadata and OLTP. He even alluded to the fact that Exadata smart flash logging may not be as beneficial to OLTP as you may be lead to believe. The other take away message from his presentation was to test everything – don’t take things for granted!

Tanel Poder was great to listen to and I think he could have gone on for many hours more talking about Exadata performance.

The day rounded off with great chat and beers for all the delegates courtesy of e-dba.

Exadata Smart Flash Logging

With the 11.2.2.4.0 release of the Exadata storage server software (and providing you are at least at 11.2.0.2 BP11), you will have the opportunity to utilise Exadata Smart Flash Logging. I thought I’d take a look at how much (if any) improvement this feature would provide to a busy production environment.

Have a look at this blog entry on Exadata Smart Flash Loggin by Luis Moreno Campos for an introduction on how it works. Basically you now issue two writes, 1 to flash 1 to your disk based redo logs, fastest write is the winner.

First lets check that we actually have some Exadata Smart Flash Log’s available to be used:

CellCLI> list flashlog detail

name                      cel07_FLASHLOG 
cellDisk FD_00_cel07,FD_08_cel07,FD_09_cel07,FD_01_cel07,FD_15_cel07,FD_06_cel07,FD_03_cel07,FD_04_cel07,FD_07_cel07,FD_02_cel07,FD_12_cel07,FD_14_cel07,FD_13_cel07,FD_11_cel07,FD_05_cel07,FD_10_cel07
 creationTime              2012-03-17T15
 degradedCelldisks 
 effectiveSize             512M 
 efficiency                100.0 
 id                        a24a25e5-062e-4be1-bb6b-3168113a5fe8 
 size                      512M 
 status                    normal

First we can see that on this cell, there is a flashlog created of size 512M. It is carved out of each the 16 flash doms in the cell. Consequently this reduces the amount you have for your flashcache, though it’s a very small reduction.

How much use are we getting out of the flashlog? Well, we can look at some metrics:

CellCLI> list metriccurrent where objectType='FLASHLOG'
 FL_ACTUAL_OUTLIERS                 FLASHLOG        0 IO requests 
 FL_BY_KEEP                         FLASHLOG        0 
 FL_DISK_FIRST                      FLASHLOG        6,168,815 IO requests 
 FL_DISK_IO_ERRS                    FLASHLOG        0 IO requests 
 FL_EFFICIENCY_PERCENTAGE           FLASHLOG        100 % 
 FL_EFFICIENCY_PERCENTAGE_HOUR      FLASHLOG        100 % 
 FL_FLASH_FIRST                     FLASHLOG        172,344 IO requests 
 FL_FLASH_IO_ERRS                   FLASHLOG        0 IO requests 
 FL_FLASH_ONLY_OUTLIERS             FLASHLOG        0 IO requests 
 FL_IO_DB_BY_W                      FLASHLOG        286,075 MB 
 FL_IO_DB_BY_W_SEC                  FLASHLOG        13.328 MB/sec 
 FL_IO_FL_BY_W                      FLASHLOG        303,793 MB 
 FL_IO_FL_BY_W_SEC                  FLASHLOG        13.761 MB/sec 
 FL_IO_W                            FLASHLOG        6,341,159 IO requests 
 FL_IO_W_SKIP_BUSY                  FLASHLOG        0 IO requests 
 FL_IO_W_SKIP_BUSY_MIN              FLASHLOG        0.0 IO/sec 
 FL_IO_W_SKIP_LARGE                 FLASHLOG        0 IO requests 
 FL_PREVENTED_OUTLIERS              FLASHLOG        415 IO requests

First off, this is taken on a very busy system:

FL_IO_FL_BY_W_SEC: 13.761 MB/sec

That is saying how much data is being written to flash by smart flash log. Well, that sounds great, but it’s not quite so simple. Remember writes go to both flash and disk.

FL_DISK_FIRST: 6,168,815 IO requests

This metric is actually telling that 6.1M I/O requests were serviced first by disk. While:

FL_FLASH_FIRST: 172,344 IO requests

Is saying this number went to flash first. Oh, that’s not a great improvement! I make that 2.7% writes went to flash first.

Finally a word on the FL_PREVENTED_OUTLIERS, this is saying there was 415 writes on this cell that would have taken more than 0.5 secs if there was no flash logging in place.

I have also checked AWR reports on before and after having flash logging in place. There is very little change. AWR shows an average wait of 3ms for Log File Parallel Write. Have a look at the wait event histogram for this:

We see the vast majority of writes are under a 1ms. This was also the case before the flash logging as well. It has not improved this at all.

This is a busy cpu bound system lets look at the Log file sync wait event histogram:

Eurgh! is the only way to describe this.

I think Kevin Closson has covered this a mere half-decade ago!

Exadata Flash Storage

Exadata flash storage is provided by the Sun Flash Accelerator F20 PCIe card shown above. Four of these cards are installed in every Exadata storage cell. There is a Documentation set available to peruse.

First, we can see these devices using lspci:

[root@cel01 ~]# lsscsi |grep  MARVELL 
[8:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdn 
[8:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdo 
[8:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdp 
[8:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdq 
[9:0:0:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdr 
[9:0:1:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sds 
[9:0:2:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdt 
[9:0:3:0]    disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdu 
[10:0:0:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdv 
[10:0:1:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdw 
[10:0:2:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdx 
[10:0:3:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdy 
[11:0:0:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdz 
[11:0:1:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdaa 
[11:0:2:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdab 
[11:0:3:0]   disk    ATA      MARVELL SD88SA02 D20Y  /dev/sdac

You can see they are bunched into 4 groups of 4 8:, 9:, 10:, and 11: This is the fact that the 4 cards each have 4 FMOD, so on every exadata the flash is presented as 16 separate devices.

We can also use the flash_dom command:


[root@cel01 ~]# flash_dom -l

Aura Firmware Update Utility, Version 1.2.7

Copyright (c) 2009 Sun Microsystems, Inc. All rights reserved..

U.S. Government Rights - Commercial Software. Government users are subject 
to the Sun Microsystems, Inc. standard license agreement and 
applicable provisions of the FAR and its supplements.

Use is subject to license terms.

This distribution may include materials developed by third parties.

Sun, Sun Microsystems, the Sun logo, Sun StorageTek and ZFS are trademarks 
or registered trademarks of Sun Microsystems, Inc. or its subsidiaries, 
in the U.S. and other countries.



 HBA# Port Name         Chip Vendor/Type/Rev    MPT Rev  Firmware Rev  IOC     WWID                 Serial Number

 1.  /proc/mpt/ioc0    LSI Logic SAS1068E C0     105      011b5c00     0       5080020000fe34c0     465769T+1130A405XA

        Current active firmware version is 011b5c00 (1.27.92) 
        Firmware image's version is MPTFW-01.27.92.00-IT 
        x86 BIOS image's version is MPTBIOS-6.26.00.00 (2008.10.14) 
        FCode image's version is MPT SAS FCode Version 1.00.49 (2007.09.21)


          D#  B___T  Type       Vendor   Product          Rev    Operating System Device Name 
          1.  0   0  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdn    [8:0:0:0] 
          2.  0   1  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdo    [8:0:1:0] 
          3.  0   2  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdp    [8:0:2:0] 
          4.  0   3  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdq    [8:0:3:0]

 2.  /proc/mpt/ioc1    LSI Logic SAS1068E C0     105      011b5c00     0       5080020000fe3440     465769T+1130A405X7

        Current active firmware version is 011b5c00 (1.27.92) 
        Firmware image's version is MPTFW-01.27.92.00-IT 
        x86 BIOS image's version is MPTBIOS-6.26.00.00 (2008.10.14) 
        FCode image's version is MPT SAS FCode Version 1.00.49 (2007.09.21)


          D#  B___T  Type       Vendor   Product          Rev    Operating System Device Name 
          1.  0   0  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdr    [9:0:0:0] 
          2.  0   1  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sds    [9:0:1:0] 
          3.  0   2  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdt    [9:0:2:0] 
          4.  0   3  Disk       ATA      MARVELL SD88SA02 D20Y   /dev/sdu    [9:0:3:0]
.
.

The output above has been edited for brevity. You can even have a look at the devices /proc/mpt/ioc1 on the filesystem.

We can also of course look at these devices via cellcli:


CellCLI> list physicaldisk where diskType='FlashDisk' 
         FLASH_1_0       1113M086V3      normal 
         FLASH_1_1       1113M086V4      normal 
         FLASH_1_2       1113M086V0      normal 
         FLASH_1_3       1113M086UY      normal 
         FLASH_2_0       1113M0892K      normal 
         FLASH_2_1       1113M086TR      normal 
         FLASH_2_2       1113M0891P      normal 
         FLASH_2_3       1113M0892L      normal 
         FLASH_4_0       1113M086UP      normal 
         FLASH_4_1       1113M086UQ      normal 
         FLASH_4_2       1113M086UT      normal 
         FLASH_4_3       1113M086UN      normal 
         FLASH_5_0       1113M08AGJ      normal 
         FLASH_5_1       1112M07V6U      normal 
         FLASH_5_2       1113M08AKJ      normal 
         FLASH_5_3       1113M08AH5      normal

Again presented as 4 lots of 4 and disktype of FlashDisk. Looking in on the detail of one of the flashdisks:


CellCLI>  list physicaldisk where diskType='FlashDisk' detail

  name:                   FLASH_5_3 
         diskType:               FlashDisk 
         errCmdTimeoutCount:     0 
         errHardReadCount:       0 
         errHardWriteCount:      0 
         errMediaCount:          0 
         errOtherCount:          0 
         errSeekCount:           0 
         luns:                   5_3 
         makeModel:              "MARVELL SD88SA02" 
         physicalFirmware:       D20Y 
         physicalInsertTime:     2011-12-07T19:00:02+00:00 
         physicalInterface:      sas 
         physicalSerial:         1113M08AH5 
         physicalSize:           22.8880615234375G 
         sectorRemapCount:       0 
         slotNumber:             "PCI Slot: 5; FDOM: 3" 
         status:                 normal

I’ve edited the above for just the detail on the FLASH_5_3 device, basically the last FDOM slot on the highest numbered PCI slot. You can see the size of each of the FDOMs at 22.8880615234375G which multiplied by 16 gives 366.21G.

We can also look at the lun level:

CellCLI> list lun where id='5_3' detail 
         name:                   5_3 
         cellDisk:               FD_15_cel01 
         deviceName:             /dev/sdy 
         diskType:               FlashDisk 
         id:                     5_3 
         isSystemLun:            FALSE 
         lunAutoCreate:          FALSE 
         lunSize:                22.8880615234375G 
         overProvisioning:       100.0 
         physicalDrives:         FLASH_5_3 
         status:                 normal

You can see each lun has a celldisk name associated with it, and a sensible naming convention. Finally drilling down into the celldisk detail:

CellCLI> list celldisk where name='FD_15_cel01' detail 
         name:                   FD_15_cel01 
         comment: 
         creationTime:           2012-01-10T10:13:06+00:00 
         deviceName:             /dev/sdy 
         devicePartition:        /dev/sdy 
         diskType:               FlashDisk 
         errorCount:             0 
         freeSpace:              0 
         id:                     8ddbd2c8-8446-4735-8948-d8aea5744b35 
         interleaving:           none 
         lun:                    5_3 
         size:                   22.875G 
         status:                 normal

The final point of interest on the flash cards is the white part, middle top on the card. That is the Energy Storage Module (ESM), and it has a set lifetime. According the F20 docs on a V2 it’s lifetime was expected at 3 years. You can monitor the health and lifetime of your modules with the following ipmi command:

[root@cel01 ~]# for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show /SYS/MB/$RISER/F20CARD/UPTIME"; done

Connected. Use ^D to exit. 
-> show /SYS/MB/RISER1/PCIE1/F20CARD/UPTIME

 /SYS/MB/RISER1/PCIE1/F20CARD/UPTIME 
    Targets:

    Properties: 
        type = Power Unit 
        ipmi_name = PCIE1/F20/UP 
        class = Threshold Sensor 
        value = 9844.000 Hours 
        upper_nonrecov_threshold = 26220.000 Hours 
        upper_critical_threshold = 25806.000 Hours 
        upper_noncritical_threshold = 25254.000 Hours 
        lower_noncritical_threshold = N/A 
        lower_critical_threshold = N/A 
        lower_nonrecov_threshold = N/A 
        alarm_status = cleared

    Commands: 
        cd 
        show

-> Session closed 
Disconnected

I’ve edited the output above to just one riser card, just to prevent boredom. You are looking to ensure the value , here showing value = 9844.000 Hours is less than the upper_noncritical_threshold, which in this case it is. Otherwise have the ESM replaced if this value is greater than the threshold.

So far I’ve found the flash cards on both V2 and X2-2 to be very reliable, I’d be interested in hearing other thoughts on their reliability.