The Dutch Prutser's Blog

By: Harald van Breederode

  • Disclaimer

    The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.
  • Subscribe

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 245 other followers

Temporary undo

Posted by Harald van Breederode on July 28, 2013

Back in Oracle 8i the Global Temporary Table was introduced to reduce the overhead of storing temporary data inside the database. A Global Temporary Table stores its data in a Temporary Tablespace where the data is retained for the duration of either a single transaction or the lifetime of a session. Performing DML on a Global Temporary Table doesn’t generate redo, because the data is stored in a temporary tablespace, but it does generate undo which in turn will generate redo since the undo is stored in a permanent tablespace. Because this redo isn’t required for recovery purposes, wouldn’t it be nice if it wasn’t generated in the first place?

Oracle 12c makes it possible to store the undo generated by performing DML on global temporary tables in a temporary tablespace. This feature is called Temporary Undo and is controlled by the temp_undo_enabled database parameter which is modifiable at the system or session level. There are two reasons for enabling this new feature:

  1. To reduce the amount of redo caused by DML on global temporary tables.
  2. To allow DML on global temporary tables in a physical standby database.

Using temporary undo to reduce the amount of redo

One reason for using temporary undo is to reduce the amount of redo caused by DML on global temporary tables. All we need to do to demonstrate this is insert a bunch of rows in a global temporary table. So, let’s get started by creating a global temporary table in an Oracle 12c database:

SQL> connect sh/sh
Connected.
SQL> create global temporary table gtt(c1 number(8), c2 char(100));

Table created.

By default the temp_undo_enabled parameter is set to false, so inserting rows in a global temporary table will generate redo as shown below:

SQL> connect sh/sh
Connected.
SQL> show parameter temp_undo_enabled

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
temp_undo_enabled                    boolean     FALSE

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              2
redo size                                                               720

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                            287
redo size                                                             63064

Based on this output we can derive that inserting 10,000 rows generated 285 redo entries with a total of 62 KBytes of redo.

Let’s repeat this test with temp_undo_enabled set to true and see what happens:

SQL> connect sh/sh
Connected.
SQL> alter session set temp_undo_enabled = true;

Session altered.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              2
redo size                                                               720

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              3
redo size                                                               992

This time only one redo entry containing 272 bytes of redo was generated by inserting 10,000 rows. Hopefully this is what you expected ;-)

Temporary undo and a physical standby database

Another reason for using temporary undo is to allow DML on global temporary tables stored in a read-only opened physical standby database. To demonstrate this we need an Oracle 12c Data Guard environment as shown below:

DGMGRL> show configuration;

Configuration - Animals

  Protection Mode: MaxPerformance
  Databases:
  dogs  - Primary database
    sheep - Physical standby database 

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

DGMGRL> show database sheep 

Database - sheep

  Role:              PHYSICAL STANDBY
  Intended State:    APPLY-ON
  Transport Lag:     0 seconds (computed 1 second ago)
  Apply Lag:         0 seconds (computed 1 second ago)
  Apply Rate:        19.00 KByte/s
  Real Time Query:   ON
  Instance(s):
    v1210

Database Status:
SUCCESS

Temporary undo is automatically enabled when connected to a physical standby database, so there is no need to set the temp_undo_enabled parameter. Let’s see what happens if we insert rows in a global temporary table in a read-only opened physical standby database:

SQL> connect sh/sh@sheep
Connected.
SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              0
redo size                                                                 0

SQL> insert into gtt select level, level from dual connect by level <= 10000;

10000 rows created.

SQL> select name, value from v$mystat natural join v$statname
  2  where name = 'redo size' or name = 'redo entries';

NAME                                                                  VALUE
---------------------------------------------------------------- ----------
redo entries                                                              0
redo size                                                                 0 

Wow, we can insert data in a read-only opened database without generating redo! This used to be impossible on earlier releases of Oracle. The output below was captured on an Oracle 11g Data Guard environment:

SQL> connect sh/sh@sheep11g
Connected.

SQL> insert into gtt select level, level from dual connect by level <= 10000;
insert into gtt select level, level from dual connect by level <= 10000
            *
ERROR at line 1:
ORA-16000: database open for read-only access

In conclusion

Temporary undo stores undo caused by performing DML on global temporary tables in a temporary tablespace thereby eliminating the need to generate redo. This makes it also possible to perform DML on global temporary tables stored in a read-only opened physical standby database.

Temporary undo is good for performance in general and Data Guard performance in particular because DML on global temporary tables in the physical standby database do not generate redo and DML on global temporary tables in the primary database generate less redo to be transmitted over the network to the standby database(s).
-Harald

Posted in Oracle | 3 Comments »

Demystifying ASM REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB

Posted by Harald van Breederode on January 3, 2013

A recurring question in my classes is how Oracle Automatic Storage Management (ASM) calculates the REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB disk group values. As usual the answer is: It depends! ;-)

In short: the REQUIRED_MIRROR_FREE_MB value in V$ASM_DISKGROUP indicates how much free space is required in an ASM disk group to restore redundancy after the failure of an ASM disk or ASM failure group. The USABLE_FILE_MB value indicates how much space is available in an ASM disk group considering the redundancy level of the disk group. The question is: How does ASM calculate these values?

The answer can be found by creating ASM disk groups in different configurations and take a look at the REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB values in V$ASM_DISKGROUP. All we need are a bunch of small disks.

ASM> select label,os_mb from v$asm_disk
  2  where label like 'SAN%' order by label;

LABEL                                OS_MB
------------------------------- ----------
SAN01                                  255
SAN02                                  255
SAN03                                  255
SAN04                                  255
SAN05                                  255
SAN06                                  255

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in external redundancy disk groups

Let’s get started by creating an external redundancy disk group on top of the above 6 disks:

ASM> create diskgroup demo external redundancy
  2  disk 'ORCL:san01'
  3  disk 'ORCL:san02'
  4  disk 'ORCL:san03'
  5  disk 'ORCL:san04'
  6  disk 'ORCL:san05'
  7  disk 'ORCL:san06'
  8  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Now we can explore this disk group by querying several columns from V$ASM_DISKGROUP:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     EXTERN       1530    1468        0   1468

The above output shows that this disk group size is 1530MB (TOTAL_MB ) of which 1468MB (FREE_MB)is free space. Because this is an external redundancy disk group, all free space is available for storing files as indicated by the USABLE_FILE_MB column. The REQUIRED_MIRROR_FREE_MB column is zero because ASM does not mirror user data in external redundancy disk groups.

Note: The “missing” 62MB is used for storing ASM metadata.

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in normal redundancy disk groups

Next we create a normal redundancy disk group with six failure groups of one disk each.

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01'
  4  failgroup FG2 disk
  5  'ORCL:san02'
  6  failgroup FG3 disk
  7  'ORCL:san03'
  8  failgroup FG4 disk
  9  'ORCL:san04'
 10  failgroup FG5 disk
 11  'ORCL:san05'
 12  failgroup FG6 disk
 13  'ORCL:san06'
 14  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM uses failure groups to provide redundancy inside disk groups. Data stored in one failure group is also stored in another failure group to protect it against failures. In a normal redundancy disk group the loss of one failure group can be tolerated without affecting data availability. ASM will restore redundancy by re-creating the lost data in another failure group. But this requires that a certain amount of disk group space is available within the same disk group to store this data. Because of this, it is useful to know how big a single failure group actually is.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      255
FG2                                      255
FG3                                      255
FG4                                      255
FG5                                      255
FG6                                      255

The above output tells us that the size of a single failure group is 255MB. Let’s see how this influences the free space requirements and how it affects usable file space:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1365      255    555

The above output indicates that there should be 255MB free space available in order to allow ASM to restore redundancy after a failure. The value is 255 because it is the size of one failure group. In case failure groups are not equally sized, ASM automatically selects the biggest failure group for this calculation.

The above output also indicates that we can use 555MB for storing files. This value is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       555

In plain English: The amount of usable file space is equal to the amount of free disk space subtracted by the amount of disk space that is required to restore redundancy after a failure divided by the redundancy level of the disk group which is two in this case.

Thus based on the amount of free space, the amount of required free space and the redundancy level ASM calculates the amount of space available for storing files. If the amount of free space changes, because a file gets created, resized or deleted, this value will also change. This can be demonstrated by creating a data file in this disk group.

SQL> create tablespace foo datafile '+DEMO' size 200m;

Tablespace created.

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     939      255    342

Because we created a 200MB data file the amount of usable file space is reduced as shown above. The reduction is slightly more than the file size because ASM allocated a bit more space for its metadata. Let’s add another file to our tablespace:

SQL> alter tablespace foo add datafile '+DEMO' size 200m;

Tablespace altered.

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     534      255    139

As expected the amount of usable file space is reduced by slightly more than the actual file size. Can we create another 200MB data file?

SQL> alter tablespace foo add datafile '+DEMO' size 200m;

Tablespace altered.

We could indeed create another 200MB data file despite the fact that ASM indicated that there was only 139MB usable file space available. It is important to note that ASM does not enforce the amount of free space indicated by REQUIRED_MIRROR_FREE_MB. Let’s take a look at the value of USABLE_FILE_MB now:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530     129      255    -63

The value has turned negative! This means that if we encounter a failure ASM will be unable to restore redundancy because there is not enough space to do so. It is very important that USABLE_FILE_MB is monitored! If it becomes negative for a particular disk group, you should either increase the size of the disk group or free up some space in it. For now we will just drop the tablespace ;-)

SQL> drop tablespace foo;

Tablespace dropped.

Now let’s create another normal redundancy disk group, but this time with only three failure groups each with two disks instead of six failure groups each with only one disk:

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02'
  5  failgroup FG2 disk
  6  'ORCL:san03',
  7  'ORCL:san04'
  8  failgroup FG3 disk
  9  'ORCL:san05',
 10  'ORCL:san06'
 11  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Now that the disk group is created we again take a look at the sizes of its failure groups:

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      510
FG2                                      510
FG3                                      510

All three failure groups are equally sized at 510MB. Let’s see how this is reflected in the space administration columns in V$ASM_DISKGROUP:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1365      510    427

As expected REQUIRED_MIRROR_FREE_MB is now 510 which is of course equal to the size of the biggest failure group. This results in 427MB of usable file space. As shown before this value is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       427

Next we create yet another normal redundancy disk group, but this time with only two failure groups each with three disks instead of three failure groups each with two disks:

ASM> create diskgroup demo normal redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02',
  5  'ORCL:san03'
  6  failgroup FG2 disk
  7  'ORCL:san04',
  8  'ORCL:san05',
  9  'ORCL:san06'
 10  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

Again we determine the size of all failure groups:

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      765
FG2                                      765

Both failure groups are equally sized at 765MB. What will the impact be on the space administration?

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     NORMAL       1530    1416      255    580

Hmmm, REQUIRED_MIRROR_FREE_MB is only 255 (the size of one disk) instead of 765 (the size of one failure group) as one might expect. Apparently ASM only anticipates on the failure of a single disk and no longer anticipates on the failure of a complete failure group.

When you think about this it actually makes sense! As stated at the beginning of this posting, ASM performs mirroring by storing the same data in different failure groups. A normal redundancy disk group requires at least two failure groups and a high redundancy disk group requires at least three failure groups. How can ASM restore redundancy in a normal redundancy disk group when there is only one failure group left? It can’t! Therefore ASM anticipates only on the loss of a single disk.

For completeness, this is how ASM calculated the USABLE_FILE_MB value of 580:

ASM> select trunc((free_mb - required_mirror_free_mb) / 2) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       580

REQUIRED_MIRROR_FREE_MB and USABLE_FILE_MB in high redundancy disk groups

Now that we covered external and normal redundancy disk groups, we will take a look at high redundancy disk groups. So, let’s get started by creating a high redundancy disk group with six failure groups of one disk each and determine the size of its failure groups:

ASM> create diskgroup demo high redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01'
  4  failgroup FG2 disk
  5  'ORCL:san02'
  6  failgroup FG3 disk
  7  'ORCL:san03'
  8  failgroup FG4 disk
  9  'ORCL:san04'
 10  failgroup FG5 disk
 11  'ORCL:san05'
 12  failgroup FG6 disk
 13  'ORCL:san06'
 14  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      255
FG2                                      255
FG3                                      255
FG4                                      255
FG5                                      255
FG6                                      255

As expected all failure groups are equally sized at 255MB. Could there be a difference between the required free and usable disk space for a high redundancy disk group and a normal redundancy disk group with the same number of disks and failure groups? Let’s have a look:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     HIGH         1530    1365      510    285

REQUIRED_MIRROR_FREE_MB is 510, which is equal to the sum of the size of two failure groups (or the sum of the sizes of the two biggest failure groups if they are not equally sized). This is because a high redundancy disk group can tolerate the loss of two failure groups without affecting data availability.

USABLE_FILE_MB is 285 and is calculated as follows:

ASM> select trunc((free_mb - required_mirror_free_mb) / 3) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       285

Finally we create another high redundancy disk group but this time with three failure groups each with two disks instead of six failure groups of one disk each, followed by querying the failure group sizes:

ASM> create diskgroup demo high redundancy
  2  failgroup FG1 disk
  3  'ORCL:san01',
  4  'ORCL:san02'
  5  failgroup FG2 disk
  6  'ORCL:san03',
  7  'ORCL:san04'
  8  failgroup FG3 disk
  9  'ORCL:san05',
 10  'ORCL:san06'
 11  attribute 'compatible.asm' = '11.2.0.0.0';

Diskgroup created.

ASM> select failgroup,sum(total_mb) from v$asm_disk
  2  where label like 'SAN%' group by failgroup order by failgroup;

FAILGROUP                      SUM(TOTAL_MB)
------------------------------ -------------
FG1                                      510
FG2                                      510
FG3                                      510

Hopefully the above output isn’t surprising because we have created quite a number of disk groups already on the disks used to write this posting. If you managed to follow and understand everything so far the next output should match your expectations:

ASM> select name, state, type, total_mb, free_mb, required_mirror_free_mb req_free, usable_file_mb use_mb
  2  from v$asm_diskgroup where name = 'DEMO';

NAME       STATE       TYPE     TOTAL_MB FREE_MB REQ_FREE USE_MB
---------- ----------- ------ ---------- ------- -------- ------
DEMO       MOUNTED     HIGH         1530    1365      510    285

The 510 listed for REQUIRED_MIRROR_FREE_MB is the size of two disks and not the size of a single failure group. The reason for this is almost the same as for a normal redundancy disk group with only two failure groups, but this time we have a high redundancy disk group with only three failure groups. If one or two failure groups are lost, ASM cannot restore redundancy as that would require at least three working failure groups.

Below is the calculation for the 285 listed as USABLE_FILE_MB:

ASM> select trunc((free_mb - required_mirror_free_mb) / 3) as useable
  2  from v$asm_diskgroup where name='DEMO';

   USEABLE
----------
       285

In summary

REQUIRED_MIRROR_FREE_MB is the amount of free space required to restore redundancy after a failure that ASM can tolerate without affecting data availability. This amount depends on the redundancy level and the number of failure groups in the disk group.

Normal redundancy disk groups with at least three failure groups require an amount of free space that is equal to the size of a single failure group. Normal redundancy disk groups with only two failure groups require an amount of free space that is equal to the size of a single disk.

High redundancy disk groups with at least four failure groups require an amount of free space that is equal to the sum of the size of two failure groups. High redundancy disk groups with only three failure groups require an amount of free space that is equal to the sum of the size of two disks.

USABLE_FILE_MB is the amount of disk space available for storing user data. This amount depends on the total size of the disk group, the disk group redundancy and the amount of required free disk space that ASM needs to restore redundancy after a failure. USABLE_FILE_MB is calculated as follows:.

USABLE_FILE_MB = (FREE_MBREQUIRED_MIRROR_FREE_MB) / [2|3]

-Harald

Posted in Oracle | 15 Comments »

Understanding Linux Load Average – Part 3

Posted by Harald van Breederode on May 28, 2012

In part 1 we performed a series of experiments to explore the relation between CPU utilization and Linux load average. We concluded that the load average is influenced by processes running on or waiting for the CPU. Based on experiments in part 2 we came to the conclusion that processes that are performing disk I/O also influence the load average on a Linux system. In this posting we will do another experiment to find out if the Linux load average is also affected by processes performing network I/O.

Network I/O and load average

To check if a correlation exists between processes performing network I/O and the load average we will start 10 processes generating network I/O on an otherwise idle system and collect various performance related statistics using the sar command. Note: My load-gen script uses the ping command to generate network I/O.

$ load-gen network 10
Starting 10 network load processes.
$ sar –n DEV 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:38:01 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:38:31 PM        lo  88953.60  88953.60 135920963.87 135920963.87      0.00      0.00      0.00
09:38:31 PM      eth1      0.13      0.17     11.33     62.33      0.00      0.00      0.00
09:38:31 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

09:38:31 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:39:01 PM        lo  89295.13  89295.13 136442626.93 136442626.93      0.00      0.00      0.00
09:39:01 PM      eth1      0.03      0.03      2.60     48.33      0.00      0.00      0.00
09:39:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

09:39:01 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:39:31 PM        lo  89364.38  89364.38 136548566.91 136548566.91      0.00      0.00      0.00
09:39:31 PM      eth1      0.10      0.10      7.34     47.30      0.00      0.00      0.03
09:39:31 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

09:39:31 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:40:01 PM        lo  89410.80  89410.80 136619365.60 136619365.60      0.00      0.00      0.00
09:40:01 PM      eth1      0.03      0.03      2.60     48.33      0.00      0.00      0.00
09:40:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

09:40:01 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:40:31 PM        lo  89502.30  89502.30 136759314.53 136759314.53      0.00      0.00      0.00
09:40:31 PM      eth1      0.23      0.27     20.60     59.33      0.00      0.00      0.00
09:40:31 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

09:40:31 PM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
09:41:01 PM        lo  89551.52  89551.52 136834718.24 136834718.24      0.00      0.00      0.00
09:41:01 PM      eth1      0.03      0.03      2.60     48.35      0.00      0.00      0.00
09:41:01 PM      eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00

Average:        IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/s  rxmcst/s
Average:           lo  89346.27  89346.27 136520905.51 136520905.51      0.00      0.00      0.00
Average:         eth1      0.09      0.11      7.85     52.33      0.00      0.00      0.01
Average:         eth0      0.00      0.00      0.00      0.00      0.00      0.00      0.00 

The above output shows that the lo interface sent and received almost 90 thousand packets per second good for a total of 136 million bytes of traffic. The other two interfaces had virtually no traffic at all. This is because my network load processes are pinging localhost. Let’s have a look at the CPU utilization before taking a look at the run-queue utilization and Load Average.

$ sar –P ALL –u 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:38:01 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:38:31 PM       all     13.90      0.00     86.10      0.00      0.00      0.00
09:38:31 PM         0     13.60      0.00     86.40      0.00      0.00      0.00
09:38:31 PM         1     14.17      0.00     85.83      0.00      0.00      0.00

09:38:31 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:39:01 PM       all     13.82      0.00     86.18      0.00      0.00      0.00
09:39:01 PM         0     13.30      0.00     86.70      0.00      0.00      0.00
09:39:01 PM         1     14.37      0.00     85.63      0.00      0.00      0.00

09:39:01 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:39:31 PM       all     13.84      0.00     86.16      0.00      0.00      0.00
09:39:31 PM         0     13.30      0.00     86.70      0.00      0.00      0.00
09:39:31 PM         1     14.37      0.00     85.63      0.00      0.00      0.00

09:39:31 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:40:01 PM       all     13.82      0.00     86.18      0.00      0.00      0.00
09:40:01 PM         0     14.10      0.00     85.90      0.00      0.00      0.00
09:40:01 PM         1     13.53      0.00     86.47      0.00      0.00      0.00

09:40:01 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:40:31 PM       all     13.75      0.00     86.25      0.00      0.00      0.00
09:40:31 PM         0     14.27      0.00     85.73      0.00      0.00      0.00
09:40:31 PM         1     13.20      0.00     86.80      0.00      0.00      0.00

09:40:31 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:41:01 PM       all     13.55      0.00     86.45      0.00      0.00      0.00
09:41:01 PM         0     13.83      0.00     86.17      0.00      0.00      0.00
09:41:01 PM         1     13.27      0.00     86.73      0.00      0.00      0.00

Average:          CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:          all     13.78      0.00     86.22      0.00      0.00      0.00
Average:            0     13.73      0.00     86.27      0.00      0.00      0.00
Average:            1     13.82      0.00     86.18      0.00      0.00      0.00
 

On average the CPU spent 14% of its time running code in user mode and 86% of the CPU time was spent running code in kernel mode. This is because the Linux kernel has to work quite hard to handle the amount of network traffic. The question is of course: What effect does this have on the Load Average?

$ sar –q 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:38:01 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:38:31 PM        10       319      4.03      1.93      1.86
09:39:01 PM        10       319      6.46      2.72      2.12
09:39:31 PM        10       319      7.85      3.41      2.37
09:40:01 PM        10       319      8.69      4.04      2.61
09:40:31 PM        10       319      9.14      4.59      2.84
09:41:01 PM        10       313      9.55      5.12      3.07
Average:           10       318      7.62      3.63      2.48
 

The above sar output shows that the run-queue was constantly occupied by 10 processes and that the 1-minute Load Average slowly climbed towards 10 as one might expect by now ;-) This could be an indication that Load Average is influenced by processes performing network I/O. But maybe the ping processes are using high amounts of CPU time and thereby forcing the Load Average to go up. To figure this out we will take a look at the top output.

top - 21:41:02 up 11:25,  1 user,  load average: 9.51, 5.19, 3.11
Tasks: 215 total,   9 running, 206 sleeping,   0 stopped,   0 zombie
Cpu(s): 13.6%us, 34.0%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 52.4%si,  0.0%st
Mem:   3074820k total,  2567640k used,   507180k free,   221652k buffers
Swap:  5144568k total,        0k used,  5144568k free,  1161696k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
30118 root      20   0  8124  768  640 R  1.1  0.0   0:00.32 ping               
30121 root      20   0  8124  768  640 R  0.9  0.0   0:00.27 ping               
30126 root      20   0  8124  772  640 R  0.5  0.0   0:00.15 ping               
30127 root      20   0  8124  772  640 R  0.5  0.0   0:00.15 ping               
30134 root      20   0  8124  772  640 R  0.4  0.0   0:00.13 ping               
30135 root      20   0  8124  768  640 R  0.4  0.0   0:00.11 ping               
30136 root      20   0  8124  764  640 R  0.4  0.0   0:00.11 ping               
30139 root      20   0  8124  768  640 R  0.2  0.0   0:00.05 ping               
27675 hbreeder  20   0 12864 1212  836 R  0.1  0.0   0:00.15 top                
 

It is clear from the above output that the ping processes are not using huge amounts of CPU time at all and that eliminates CPU utilization as the driving force behind the Load Average. The above output also reveals that the high CPU utilization is mainly caused by handling software interrupts, 52% in this case.

Conclusion

Based on this experiment we can conclude that processes performing network I/O have an effect on the Linux Load Average. And based on the experiments in the previous two postings we concluded that processes running on, or waiting for, the CPU and processes performing disk I/O also have an effect on the Linux Load Average. Thus the 3 factors that drive the Load Average on a Linux system are processes that are on the run-queue because they:

  • Run on, or are waiting for, the CPU

  • Perform disk I/O

  • Perform network I/O

Summary

The Linux Load Average is driven by the three factors mentioned above, but how does one interpret a Load Average that seems to be too high? The first step is to look at the CPU utilization. If this isn’t 100% and the Load Average is above the number of CPU’s in the system, the Load Average is primarily driven by processes performing disk I/O, network I/O or the combination of both. Finding the processes responsible for most of the I/O isn’t straightforward because there aren’t many tools available to assist you in doing so. A very useful tool is iotop but it doesn’t seem to work on Oracle Linux 5. It does work on Oracle Linux 6 however. Another tool is atop but it requires one or more kernel patches to be useful.

If the CPU utilization is 100% and the Load Average is above the number of CPUs in the system, the Load Average is either completely driven by processes running on, or waiting for, the CPU or driven by a combination of processes running on, or waiting for, the CPU and processes performing I/O (which could be in turn a combination of disk and network I/O). Using top is an easy method to verify if CPU utilization is indeed solely responsible for the current Load Average or that the other two factors play a role as well. Knowing your system does help a lot when it comes to troubleshooting performance problems. Taking performance baselines using sar is always a good thing to do.
-Harald

Posted in Linux | 24 Comments »

 
Follow

Get every new post delivered to your Inbox.

Join 245 other followers