I encountered a curious failure in a dataguard environment, that seems interesting enough to distribute to a wider audience. The system was running 10.2.0.3 on Linux, with the datafiles stored in ASM. This was recorded recently in the RDBMS instance alert log:
MRP0: Background Media Recovery terminated with error 1237 Sun Mar 23 09:57:19 2008 Errors in file /opt/oracle/product/admin/STANDBY/bdump/standby1_mrp0_27165.trc: ORA-01237: cannot extend datafile 35 ORA-01110: data file 35: '+DATA4/standby/datafile35.dbf' ORA-17505: ksfdrsz:1 Failed to resize file to size 1624704 blocks ORA-15041: diskgroup space exhausted
While in the ASM instance alert log I found the following:
Sun Mar 23 09:57:18 2008 WARNING: allocation failure on disk VOL5 for file 286 xnum 12693
At first sight you might think this is an obvious case of the diskgroup filling up and that more space needs to be allocated to it. However when I checked how much free space was available I saw:
SQL> select name, total_mb, free_mb, usable_file_mb from v$asm_diskgroup; NAME TOTAL_MB FREE_MB USABLE_FILE_MB ------------------------------ ---------- ---------- -------------- DATA1 61439 29766 29766 DATA2 10239 3356 3356 DATA3 10239 3356 3356 DATA4 220391 25077 25077 FRA 68197 67959 67959
So as far as the V$ASM_DISKGROUP view was concerned there really was enough storage space to allocate to this datafile, note this datafile was already a considerable size so the amount it was extending was nothing compared to the 25GB free. Much scratching of heads ensued, and I started thinking if fragmentation could be responsible, but then I looked at the contents of the affected diskgroup using asmcmd.
I spotted that there was a datafile that had been removed a couple of weeks previously from the primary. At the same time as that datafile, several others had been removed and all these were gone. And then we remembered that the first drop tablespace command had not included the and datafiles clause. We have standby_file_management set to auto and this worked perfectly for files that were automatically removed on the primary. It did not work for the datafile that was removed manually, as you’d probably expect. Running rm within ASMCMD logged the following kind of thing into the alert log of the ASM instance:
SQL> alter diskgroup 'DATA4' drop file '+DATA4/STANDBY/removed_datafile.dbf'
Once this datafile was removed the standby could continue processing happily and whatever caused it to fail to extend the datafile was now not causing it a problem. The real question is why I could not allocate the space when the diskgroup was not really full?
I think what happened was that the RDBMS instance thought the space allocated to the datafile that was being used by the dropped tablesapace was now available for use again and the database instance tried to extend a tablespace into this space but found it was in fact still occupied by the datafile that had not been removed.
Clearly having the database clean up datafiles automatically is a really useful feature, and this becomes doubly so in the case of a dataguard environment. Certainly, I think it is a good idea to drop a tablespace with the including contents and datafiles clause.
I agree Jason, dropping tablespaces without including the files creates a mess which is hard to cleanup after.
Comment by Yas — April 1, 2008 @ 1:33 pm |
Hello Yas,
I think without the ASM weirdness, we still might have suffered from having this datafile hanging around unwittingly.
cheers,
jason.
Comment by jarneil — April 1, 2008 @ 2:12 pm |
[...] Jason Arniel writes, “I encountered a curious failure in a dataguard environment, that seems interesting enough to distribute to a wider audience.” It seemed to be that the diskgroup was filling up, but as Jason demonstrates, it turns out to be a matter of managing datafiles on a standby using ASM. [...]
Pingback by Log Buffer #91: a Carnival of the Vanities for DBAs — April 4, 2008 @ 4:46 pm |