The Dutch Prutser's Blog

By: Harald van Breederode

  • Disclaimer

    The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.
  • Subscribe

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 244 other followers

Archive for the ‘Oracle’ Category

Temporary undo

Posted by Harald van Breederode on July 28, 2013

Back in Oracle 8i the Global Temporary Table was introduced to reduce the overhead of storing temporary data inside the database. A Global Temporary Table stores its data in a Temporary Tablespace where the data is retained for the duration of either a single transaction or the lifetime of a session. Performing DML on a Global Temporary Table doesn’t generate redo, because the data is stored in a temporary tablespace, but it does generate undo which in turn will generate redo since the undo is stored in a permanent tablespace. Because this redo isn’t required for recovery purposes, wouldn’t it be nice if it wasn’t generated in the first place?

Oracle 12c makes it possible to store the undo generated by performing DML on global temporary tables in a temporary tablespace. This feature is called Temporary Undo and is controlled by the temp_undo_enabled database parameter which is modifiable at the system or session level. There are two reasons for enabling this new feature:

  1. To reduce the amount of redo caused by DML on global temporary tables.
  2. To allow DML on global temporary tables in a physical standby database.

Using temporary undo to reduce the amount of redo

One reason for using temporary undo is to reduce the amount of redo caused by DML on global temporary tables. All we need to do to demonstrate this is insert a bunch of rows in a global temporary table. So, let’s get started by creating a global temporary table in an Oracle 12c database:

SQL> connect sh/sh
Connected.
SQL> create global temporary table gtt(c1 number(8), c2 char(100));

Table created.

By default the temp_undo_enabled parameter is set to false, so inserting rows in a global temporary table will generate redo as shown below:

SQL> connect sh/sh
Connected.
SQL> show parameter temp_undo_enabled

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
temp_undo_enabled                    boolean     FALSE

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              2
redo size                                                               720

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                            287
redo size                                                             63064

Based on this output we can derive that inserting 10,000 rows generated 285 redo entries with a total of 62 KBytes of redo.

Let’s repeat this test with temp_undo_enabled set to true and see what happens:

SQL> connect sh/sh
Connected.
SQL> alter session set temp_undo_enabled = true;

Session altered.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              2
redo size                                                               720

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              3
redo size                                                               992

This time only one redo entry containing 272 bytes of redo was generated by inserting 10,000 rows. Hopefully this is what you expected ;-)

Temporary undo and a physical standby database

Another reason for using temporary undo is to allow DML on global temporary tables stored in a read-only opened physical standby database. To demonstrate this we need an Oracle 12c Data Guard environment as shown below:

DGMGRL> show configuration;

Configuration - Animals

  Protection Mode: MaxPerformance
  Databases:
  dogs  - Primary database
    sheep - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database sheep 

Database - sheep

  Role:              PHYSICAL STANDBY
  Intended State:    APPLY-ON
  Transport Lag:     0 seconds (computed 1 second ago)
  Apply Lag:         0 seconds (computed 1 second ago)
  Apply Rate:        19.00 KByte/s
  Real Time Query:   ON
  Instance(s):
    v1210

Database Status:
SUCCESS

Temporary undo is automatically enabled when connected to a physical standby database, so there is no need to set the temp_undo_enabled parameter. Let’s see what happens if we insert rows in a global temporary table in a read-only opened physical standby database:

SQL> connect sh/sh@sheep
Connected.
SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              0
redo size                                                                 0

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              0
redo size                                                                 0 

Wow, we can insert data in a read-only opened database without generating redo! This used to be impossible on earlier releases of Oracle. The output below was captured on an Oracle 11g Data Guard environment:

SQL> connect sh/sh@sheep11g
Connected.

SQL> insert into gtt select level, level from dual connect by level <= 10000;
insert into gtt select level, level from dual connect by level <= 10000
            *
ERROR at line 1:
ORA-16000: database open for read-only access

In conclusion

Temporary undo stores undo caused by performing DML on global temporary tables in a temporary tablespace thereby eliminating the need to generate redo. This makes it also possible to perform DML on global temporary tables stored in a read-only opened physical standby database.

Temporary undo is good for performance in general and Data Guard performance in particular because DML on global temporary tables in the physical standby database do not generate redo and DML on global temporary tables in the primary database generate less redo to be transmitted over the network to the standby database(s).
-Harald

Posted in Oracle | 3 Comments »

Demystifying ASM REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB

Posted by Harald van Breederode on January 3, 2013

A recurring question in my classes is how Oracle Automatic Storage Management (ASM) calculates the REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB disk group values. As usual the answer is: It depends! ;-)

In short: the REQUIRED_MIRROR_FREE_MB value in V$ASM_DISKGROUP indicates how much free space is required in an ASM disk group to restore redundancy after the failure of an ASM disk or ASM failure group. The USABLE_FILE_MB value indicates how much space is available in an ASM disk group considering the redundancy level of the disk group. The question is: How does ASM calculate these values?

The answer can be found by creating ASM disk groups in different configurations and take a look at the REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB values in V$ASM_DISKGROUP. All we need are a bunch of small disks.

ASM> select label,os_mb from v$asm_disk
  2  where label like 'SAN%' order by label;

LABEL                                OS_MB
------------------------------- ----------
SAN01                                  255
SAN02                                  255
SAN03                                  255
SAN04                                  255
SAN05                                  255
SAN06                                  255

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in external redundancy disk groups

Let’s get started by creating an external redundancy disk group on top of the above 6 disks:

ASM> create diskgroup demo external redundancy
  2  disk 'ORCL:san01'
  3  disk 'ORCL:san02'
  4  disk 'ORCL:san03'
  5  disk 'ORCL:san04'
  6  disk 'ORCL:san05'
  7  disk 'ORCL:san06'
  8  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Now we can explore this disk group by querying several columns from V$ASM_DISKGROUP:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     EXTERN       1530    1468        0   1468

The above output shows that this disk group size is 1530MB (TOTAL_MB ) of which 1468MB (FREE_MB)is free space. Because this is an external redundancy disk group, all free space is available for storing files as indicated by the USABLE_FILE_MB column. The REQUIRED_MIRROR_FREE_MB column is zero because ASM does not mirror user data in external redundancy disk groups.

Note: The “missing” 62MB is used for storing ASM metadata.

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in normal redundancy disk groups

Next we create a normal redundancy disk group with six failure groups of one disk each.

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01'
  4  failgroup FG2 disk
  5  'ORCL:san02'
  6  failgroup FG3 disk
  7  'ORCL:san03'
  8  failgroup FG4 disk
  9  'ORCL:san04'
 10  failgroup FG5 disk
 11  'ORCL:san05'
 12  failgroup FG6 disk
 13  'ORCL:san06'
 14  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM uses failure groups to provide redundancy inside disk groups. Data stored in one failure group is also stored in another failure group to protect it against failures. In a normal redundancy disk group the loss of one failure group can be tolerated without affecting data availability. ASM will restore redundancy by re-creating the lost data in another failure group. But this requires that a certain amount of disk group space is available within the same disk group to store this data. Because of this, it is useful to know how big a single failure group actually is.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      255
FG2                                      255
FG3                                      255
FG4                                      255
FG5                                      255
FG6                                      255

The above output tells us that the size of a single failure group is 255MB. Let’s see how this influences the free space requirements and how it affects usable file space:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1365      255    555

The above output indicates that there should be 255MB free space available in order to allow ASM to restore redundancy after a failure. The value is 255 because it is the size of one failure group. In case failure groups are not equally sized, ASM automatically selects the biggest failure group for this calculation.

The above output also indicates that we can use 555MB for storing files. This value is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       555

In plain English: The amount of usable file space is equal to the amount of free disk space subtracted by the amount of disk space that is required to restore redundancy after a failure divided by the redundancy level of the disk group which is two in this case.

Thus based on the amount of free space, the amount of required free space and the redundancy level ASM calculates the amount of space available for storing files. If the amount of free space changes, because a file gets created, resized or deleted, this value will also change. This can be demonstrated by creating a data file in this disk group.

SQL> create tablespace foo datafile '+DEMO' size 200m;

Tablespace created.

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     939      255    342

Because we created a 200MB data file the amount of usable file space is reduced as shown above. The reduction is slightly more than the file size because ASM allocated a bit more space for its metadata. Let’s add another file to our tablespace:

SQL> alter tablespace foo add datafile '+DEMO' size 200m;

Tablespace altered.

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     534      255    139

As expected the amount of usable file space is reduced by slightly more than the actual file size. Can we create another 200MB data file?

SQL> alter tablespace foo add datafile '+DEMO' size 200m;

Tablespace altered.

We could indeed create another 200MB data file despite the fact that ASM indicated that there was only 139MB usable file space available. It is important to note that ASM does not enforce the amount of free space indicated by REQUIRED_MIRROR_FREE_MB. Let’s take a look at the value of USABLE_FILE_MB now:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     129      255    -63

The value has turned negative! This means that if we encounter a failure ASM will be unable to restore redundancy because there is not enough space to do so. It is very important that USABLE_FILE_MB is monitored! If it becomes negative for a particular disk group, you should either increase the size of the disk group or free up some space in it. For now we will just drop the tablespace ;-)

SQL> drop tablespace foo;

Tablespace dropped.

Now let’s create another normal redundancy disk group, but this time with only three failure groups each with two disks instead of six failure groups each with only one disk:

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02'
  5  failgroup FG2 disk
  6  'ORCL:san03',
  7  'ORCL:san04'
  8  failgroup FG3 disk
  9  'ORCL:san05',
 10  'ORCL:san06'
 11  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Now that the disk group is created we again take a look at the sizes of its failure groups:

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      510
FG2                                      510
FG3                                      510

All three failure groups are equally sized at 510MB. Let’s see how this is reflected in the space administration columns in V$ASM_DISKGROUP:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1365      510    427

As expected REQUIRED_MIRROR_FREE_MB is now 510 which is of course equal to the size of the biggest failure group. This results in 427MB of usable file space. As shown before this value is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       427

Next we create yet another normal redundancy disk group, but this time with only two failure groups each with three disks instead of three failure groups each with two disks:

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02',
  5  'ORCL:san03'
  6  failgroup FG2 disk
  7  'ORCL:san04',
  8  'ORCL:san05',
  9  'ORCL:san06'
 10  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Again we determine the size of all failure groups:

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      765
FG2                                      765

Both failure groups are equally sized at 765MB. What will the impact be on the space administration?

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1416      255    580

Hmmm, REQUIRED_MIRROR_FREE_MB is only 255 (the size of one disk) instead of 765 (the size of one failure group) as one might expect. Apparently ASM only anticipates on the failure of a single disk and no longer anticipates on the failure of a complete failure group.

When you think about this it actually makes sense! As stated at the beginning of this posting, ASM performs mirroring by storing the same data in different failure groups. A normal redundancy disk group requires at least two failure groups and a high redundancy disk group requires at least three failure groups. How can ASM restore redundancy in a normal redundancy disk group when there is only one failure group left? It can’t! Therefore ASM anticipates only on the loss of a single disk.

For completeness, this is how ASM calculated the USABLE_FILE_MB value of 580:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       580

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in high redundancy disk groups

Now that we covered external and normal redundancy disk groups, we will take a look at high redundancy disk groups. So, let’s get started by creating a high redundancy disk group with six failure groups of one disk each and determine the size of its failure groups:

ASM> create diskgroup demo high redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01'
  4  failgroup FG2 disk
  5  'ORCL:san02'
  6  failgroup FG3 disk
  7  'ORCL:san03'
  8  failgroup FG4 disk
  9  'ORCL:san04'
 10  failgroup FG5 disk
 11  'ORCL:san05'
 12  failgroup FG6 disk
 13  'ORCL:san06'
 14  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      255
FG2                                      255
FG3                                      255
FG4                                      255
FG5                                      255
FG6                                      255

As expected all failure groups are equally sized at 255MB. Could there be a difference between the required free and usable disk space for a high redundancy disk group and a normal redundancy disk group with the same number of disks and failure groups? Let’s have a look:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     HIGH         1530    1365      510    285

REQUIRED_MIRROR_FREE_MB is 510, which is equal to the sum of the size of two failure groups (or the sum of the sizes of the two biggest failure groups if they are not equally sized). This is because a high redundancy disk group can tolerate the loss of two failure groups without affecting data availability.

USABLE_FILE_MB is 285 and is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 3) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       285

Finally we create another high redundancy disk group but this time with three failure groups each with two disks instead of six failure groups of one disk each, followed by querying the failure group sizes:

ASM> create diskgroup demo high redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02'
  5  failgroup FG2 disk
  6  'ORCL:san03',
  7  'ORCL:san04'
  8  failgroup FG3 disk
  9  'ORCL:san05',
 10  'ORCL:san06'
 11  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      510
FG2                                      510
FG3                                      510

Hopefully the above output isn’t surprising because we have created quite a number of disk groups already on the disks used to write this posting. If you managed to follow and understand everything so far the next output should match your expectations:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     HIGH         1530    1365      510    285

The 510 listed for REQUIRED_MIRROR_FREE_MB is the size of two disks and not the size of a single failure group. The reason for this is almost the same as for a normal redundancy disk group with only two failure groups, but this time we have a high redundancy disk group with only three failure groups. If one or two failure groups are lost, ASM cannot restore redundancy as that would require at least three working failure groups.

Below is the calculation for the 285 listed as USABLE_FILE_MB:

ASM> select trunc((free_mb - required_mirror_free_mb) / 3) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       285

In summary

REQUIRED_MIRROR_FREE_MB is the amount of free space required to restore redundancy after a failure that ASM can tolerate without affecting data availability. This amount depends on the redundancy level and the number of failure groups in the disk group.

Normal redundancy disk groups with at least three failure groups require an amount of free space that is equal to the size of a single failure group. Normal redundancy disk groups with only two failure groups require an amount of free space that is equal to the size of a single disk.

High redundancy disk groups with at least four failure groups require an amount of free space that is equal to the sum of the size of two failure groups. High redundancy disk groups with only three failure groups require an amount of free space that is equal to the sum of the size of two disks.

USABLE_FILE_MB is the amount of disk space available for storing user data. This amount depends on the total size of the disk group, the disk group redundancy and the amount of required free disk space that ASM needs to restore redundancy after a failure. USABLE_FILE_MB is calculated as follows:.

USABLE_FILE_MB = (FREE_MBREQUIRED_MIRROR_FREE_MB) / [2|3]

-Harald

Posted in Oracle | 15 Comments »

Password file maintenance in a Data Guard environment

Posted by Harald van Breederode on June 13, 2011

In a previous posting I wrote about password file maintenance in a clustered ASM and RAC environment.
This article raised another question: Is there anything specific about password file maintenance in a Data Guard environment?

Yes, updating a password file in a Data Guard environment isn’t as straight forward as one might think. In this posting I will shed some light on this subject and show you how to properly update a password file in a Data Guard environment.

Let’s get started by taking a look at my Data Guard setup before diving into password file maintenance procedures.

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

This output shows that I have a primary database called peppi, a physical standby database called kokki and that the overall status of my Data Guard configuration is healthy.

Note: peppi runs on host prutser and kokki runs on host el5.

Revealing the problem

The first question is: Are updates to the password file on the primary database propagated to the standby database(s)? We can easily figure this out by triggering an update to the password file on peppi followed by querying the password file on kokki.

SQL> connect sys/oracle@peppi as sysdba
Connected.
SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS
------------------------------ ----- ----- -----
SYS                            TRUE  TRUE  FALSE

Currently there is only one entry in the password file on peppi. By granting SYSDBA to SYSTEM we can trigger an update to the password file as shown below:

SQL> grant sysdba to system;

Grant succeeded.

SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS
------------------------------ ----- ----- -----
SYS                            TRUE  TRUE  FALSE
SYSTEM                         TRUE  FALSE FALSE

There are now two entries in the password file on the primary database. How many entries are there on the standby database?

SQL> connect sys/oracle@kokki as sysdba
Connected.
SQL> select * from v$pwfile_users;

USERNAME                       SYSDB SYSOP SYSAS
------------------------------ ----- ----- -----
SYS                            TRUE  TRUE  FALSE

The output above makes clear that updates to the password file on the primary database are not propagated to the standby database.

Before continuing, we take away SYSDBA from SYSTEM because we don’t want SYSTEM to become too powerful, do we?

SQL> connect sys/oracle@peppi as sysdba
Connected.
SQL> revoke sysdba from system;

Revoke succeeded.

Does this affect Data Guard?

The next question is: Does this affect Data Guard? The primary database sends its redo to its standby database(s) and this redo transport is authenticated by the password of the SYS user, or another user if configured, of the primary database. That is, the primary database logs into a standby database by using the password stored in the password file for the user who ships the redo, which is most likely SYS. If the (encrypted) passwords of the primary database and the standby database(s) don’t match redo transport will (eventually) be in trouble.

To demonstrate this behavior we will change the SYS password on peppi and see if or how it affects kokki.

SQL> alter user sys identified by prutser;

User altered.
SQL> connect sys/prutser@peppi as sysdba
Connected.
SQL> connect sys/prutser@kokki as sysdba
ERROR:
ORA-01017: invalid username/password; logon denied


Warning: You are no longer connected to ORACLE.

The above output shows (again) that an update to the password file on the primary database is not automatically propagated to the standby database(s).

SQL> connect sys/oracle@kokki as sysdba
Connected.

As a matter of fact, we can still connect to kokki using the old SYS password as shown above.

The next question is: Is redo transport still possible now that the passwords are no longer the same? Let’s switch a few logs on the primary database and see what happens.

SQL> connect sys/prutser@peppi as sysdba
Connected.
SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

According to the above output everything is still fine. Apparently the password change didn’t affect the redo transport. This is because the primary database was already logged into the standby database and as long as this connection remains open there is no need for the standby database to re-authenticate the incoming redo. However if we disable and re-enable redo transport it becomes clear that we indeed have a problem as shown below.

DGMGRL> edit database kokki set property LogShipping=off;
Property "logshipping" updated

DGMGRL> edit database kokki set property LogShipping=on;
Property "logshipping" updated

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
      Error: ORA-16778: redo transport error for one or more databases

    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

It is pretty clear that redo transport ceased indicated by the ORA-16778. It is important to realize that a problem caused by updating the password file on the primary database doesn’t necessarily show up immediately but may show up much later in time.

What is an ORA-16778 anyway?

$ oerr ora 16778
16778, 00000, "redo transport error for one or more databases"
// *Cause:  The redo transport service was unable to send redo data to one
//          or more standby databases.
// *Action: Check the Data Guard broker log and Oracle alert log for
//          more details. Query the LogXptStatus property to see the
//          errors.

Because we just changed the SYS password on the primary database we already know what caused the ORA-16778. So the question is: How do we fix this?

In search of a solution

Maybe we can simply re-create the password file on kokki with the new SYS password? Let’s give it a try:

el5$ rm $ORACLE_HOME/dbs/orapwv1120

el5$ orapwd file=$ORACLE_HOME/dbs/orapwv1120 password=prutser

SQL> connect sys/prutser@kokki as sysdba
Connected.

That seems to work! The question is of course: Does Data Guard agree with my optimism?

SQL> connect sys/prutser@peppi as sysdba
Connected.
SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
      Error: ORA-16778: redo transport error for one or more databases

    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

Hmmm, this doesn’t look too good, does it? Maybe we need to disable and re-enable redo shipment?

DGMGRL> edit database kokki set property LogShipping=off;
Property "logshipping" updated

DGMGRL> edit database kokki set property LogShipping=on;
Property "logshipping" updated

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
      Error: ORA-16778: redo transport error for one or more databases

    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

No, this didn’t help either. Re-creating the password file on the standby database is not the solution. The reason for this is the way the passwords are encrypted in the password file. Even if the passwords are the same, the result of the encryption is not.

How about copying the password file from the primary database to the standby database?

The solution

We will copy the password file from peppi to kokki and see if it works:

$ scp $ORACLE_HOME/dbs/orapwv1120 el5:$ORACLE_HOME/dbs/orapwv1120
orapwv1120                                    100% 1536     1.5KB/s   00:00    

SQL> connect sys/prutser@kokki as sysdba
Connected.

Again we can connect to kokki with the updated SYS password, but is Data Guard happy now?

SQL> connect sys/prutser@peppi as sysdba
Connected.
SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

SQL> alter system switch logfile;

System altered.

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
      Error: ORA-16778: redo transport error for one or more databases

    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
ERROR

Hmmm, again Data Guard doesn’t look like a happy camper. But things are not what they seem. If we wait long enough Data Guard will re-open its archive destinations and redo will automatically flow again. We can trigger this by disabling and re-enabling redo shipment:

DGMGRL> edit database kokki set property LogShipping=off;
Property "logshipping" updated

DGMGRL> edit database kokki set property LogShipping=on;
Property "logshipping" updated

DGMGRL> show configuration;

Configuration - PeppiEnKokki

  Protection Mode: MaxPerformance
  Databases:
    peppi - Primary database
    kokki - Physical standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

Finally Data Guard is back in business again because the primary database is able to login on its standby database(s) and thereby successfully ship redo to them.

In summary

In order to keep Data Guard going, we must copy the password file from the primary database to the standby database(s) after an update is made to the password file on the primary database. Happy Data Guarding ;-)
-Harald

Posted in Oracle | 19 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 244 other followers