The Dutch Prutser's Blog

By: Harald van Breederode

  • Disclaimer

    The views expressed on this blog are my own and do not necessarily reflect the views of Oracle.
  • Subscribe

  • Enter your email address to subscribe to this blog and receive notifications of new posts by email.

    Join 245 other followers

Understanding Linux Load Average – Part 1

Posted by Harald van Breederode on April 23, 2012

A frequently asked question in my classroom is “What is the meaning of load average and when is it too high?”. This may sound like an easy question, and I really thought it was, but recently I discovered that things aren’t always that easy as they seem. In this first of a three-part post I will explain what the meaning of Linux load average is and how to diagnose load averages that may seem too high.

Obtaining the current load average is very simple by issuing the uptime command:

$ uptime
21:49:05 up 11:33,  1 user,  load average: 10.52, 6.03, 3.78

But what is the meaning of these 3 numbers? Basically load average is the run-queue utilization averaged over the last minute, the last 5 minutes and the last 15 minutes. The run-queue is a list of processes waiting for a resource to become available inside the Linux operating system. The example above indicates that on average there were 10.52 processes waiting to be scheduled on the run-queue measured over the last minute.

The questions are of course: Which processes are on the run-queue? And what are they waiting for? Why not find the answer to these questions by performing a series of experiments?

CPU utilization and load average

To be able to perform the necessary experiments I wrote a few shell scripts to generate various types of load on my Linux box. The first experiment is to start one CPU load process, on an otherwise idle system, and watch its effect on the load average using the sar command:

$ load-gen cpu
Starting 1 CPU load process.
$ sar –q 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:06:54 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:07:24 PM         1       290      0.39      0.09      0.15
09:07:54 PM         1       290      0.63      0.18      0.18
09:08:24 PM         1       290      0.77      0.26      0.20
09:08:54 PM         1       290      0.86      0.33      0.22
09:09:24 PM         1       290      0.97      0.40      0.25
09:09:54 PM         1       288      0.98      0.46      0.28
Average:            1       290      0.77      0.29      0.21 

The above sar output reported the load average 6 times with an interval of 30 seconds. It shows that there was 1 process constantly on the run-queue resulting that the 1 minute load average slowly climbs to a value of 1 and then stabilizes there. The 5 minute load average will continue to climb for a few more minutes and will also stabilize at a value of 1 and the same is true for the 15 minute load average assuming the run-queue utilization will remain the same.

The next step is to take a look at the CPU utilization to check if there is a correlation between it and the load average. While measuring the load average using sar I also had it running to report the CPU utilization.

$ sar –u 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:06:54 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:07:24 PM       all     50.48      0.00      0.65      0.00      0.00     48.87
09:07:54 PM       all     50.40      0.00      0.48      0.02      0.00     49.10
09:08:24 PM       all     50.03      0.00      0.57      0.02      0.00     49.39
09:08:54 PM       all     49.97      0.00      0.52      0.00      0.00     49.52
09:09:24 PM       all     50.10      0.00      0.52      0.02      0.00     49.37
09:09:54 PM       all     50.23      0.00      0.55      0.02      0.00     49.21
Average:          all     50.20      0.00      0.55      0.01      0.00     49.24

This shows that overall the system was roughly spending 50% of its time running user processes and the other 50% was spent doing nothing. Thus only half of the machine’s capacity was used to run the CPU load which caused a load average of 1. Isn’t that strange? Not if you know that the machine is equipped with two processors. While one CPU was busy running the load the other CPU was idle resulting in an overall CPU utilization of 50%.

Personally I prefer using sar to peek around in a busy Linux system but other people tend to use top for the same thing. This is what top had to report about the situation we are studying using sar:

$ top –bi –d30 –n7
top - 21:09:55 up 10:54,  1 user,  load average: 0.98, 0.46, 0.28
Tasks: 188 total,   2 running, 186 sleeping,   0 stopped,   0 zombie
Cpu(s): 50.2%us,  0.5%sy,  0.0%ni, 49.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3074820k total,  2539340k used,   535480k free,   218600k buffers
Swap:  5144568k total,        0k used,  5144568k free,  1160120k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
27348 hbreeder  20   0 63836 1068  908 R 99.8  0.0   3:00.31 busy-cpu           
27354 hbreeder  20   0 12756 1184  836 R  0.0  0.0   0:00.12 top                

The -bi command line option given to top tells it to go into batch-mode, instead of full-screen-mode, and to ignore idle processes. The -d30 and the -n7 instructs top to produce 7 sets of output with a delay of 30 seconds between them. The output above is the last of 7 sets of output top produced.

Besides everything we already discovered by looking at the various sar outputs, top gives us useful information about the processes consuming CPU time as well as information about physical and virtual memory usage. It is interesting to see that the busy-cpu process consumes 99.8% while the overall CPU utilization is slightly over 50% resulting in 49% of idle time.

The explanation for this is that top reports an averaged CPU utilization in the header section of its output while the per process CPU utilization is not averaged over the total number of processors.

We can verify this statement by using the -P ALL command line option to make sar report the CPU utilization on a per processor basis as well as the averaged values.

$ sar –P ALL –u 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:06:54 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:07:24 PM       all     50.48      0.00      0.65      0.00      0.00     48.87
09:07:24 PM         0      0.97      0.00      1.27      0.00      0.00     97.77
09:07:24 PM         1     99.97      0.00      0.03      0.00      0.00      0.00

09:07:24 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:07:54 PM       all     50.41      0.00      0.48      0.02      0.00     49.09
09:07:54 PM         0      0.83      0.00      0.97      0.00      0.00     98.20
09:07:54 PM         1    100.00      0.00      0.00      0.00      0.00      0.00

09:07:54 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:08:24 PM       all     50.03      0.00      0.57      0.02      0.00     49.38
09:08:24 PM         0     75.89      0.00      0.87      0.00      0.00     23.24
09:08:24 PM         1     24.17      0.00      0.27      0.03      0.00     75.53

09:08:24 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:08:54 PM       all     49.95      0.00      0.52      0.00      0.00     49.53
09:08:54 PM         0     81.03      0.00      0.77      0.00      0.00     18.21
09:08:54 PM         1     18.91      0.00      0.23      0.00      0.00     80.86

09:08:54 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:09:24 PM       all     50.11      0.00      0.52      0.02      0.00     49.36
09:09:24 PM         0     57.05      0.00      0.93      0.03      0.00     41.99
09:09:24 PM         1     43.12      0.00      0.17      0.03      0.00     56.68

09:09:24 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:09:54 PM       all     50.23      0.00      0.55      0.02      0.00     49.21
09:09:54 PM         0     19.94      0.00      0.97      0.00      0.00     79.09
09:09:54 PM         1     80.56      0.00      0.07      0.00      0.00     19.37

Average:          CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:          all     50.20      0.00      0.55      0.01      0.00     49.24
Average:            0     39.28      0.00      0.96      0.01      0.00     59.75
Average:            1     61.12      0.00      0.13      0.01      0.00     38.74

This output confirms that most of the time only one of the two available processors was busy resulting in an overall averaged CPU utilization of 50.2%.

The next experiment is to add a second CPU load process to the still running first CPU load process. This will increase the number of processes on the run-queue from 1 to 2. What effect will this have on the load average?

$ load-gen cpu
Starting 1 CPU load process.
$ sar –q 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:09:55 PM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15
09:10:25 PM         2       291      1.38      0.60      0.33
09:10:55 PM         2       291      1.62      0.74      0.38
09:11:25 PM         2       291      1.77      0.86      0.43
09:11:55 PM         2       291      1.86      0.96      0.48
09:12:25 PM         2       291      1.91      1.06      0.53
09:12:55 PM         2       291      1.95      1.15      0.57
Average:            2       291      1.75      0.90      0.45
 

The output above shows that the number of processes on the run-queue is now indeed 2 and that the load average is climbing to a value of 2 as a result of this. Because there are now 2 processes hogging the CPU we can expect that the overall averaged CPU utilization is close to 100%. The top output below confirms this:

$ top –bi –d30 –n7
top - 21:12:55 up 10:57,  1 user,  load average: 1.95, 1.15, 0.57
Tasks: 189 total,   3 running, 186 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.3%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3074820k total,  2540968k used,   533852k free,   218756k buffers
Swap:  5144568k total,        0k used,  5144568k free,  1160212k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
27377 hbreeder  20   0 63836 1064  908 R 99.4  0.0   2:59.45 busy-cpu           
27348 hbreeder  20   0 63836 1068  908 R 98.8  0.0   5:59.18 busy-cpu           
27383 hbreeder  20   0 12756 1188  836 R  0.1  0.0   0:00.13 top                

Please note that top reports 2 processes using nearly 100% CPU time. Using sar we can verify that indeed both processors are now fully utilized.

$ sar –P ALL –u 30 6
Linux 2.6.32-300.20.1.el5uek (roger.example.com) 	04/21/2012

09:09:55 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:10:25 PM       all     99.22      0.00      0.78      0.00      0.00      0.00
09:10:25 PM         0     98.60      0.00      1.40      0.00      0.00      0.00
09:10:25 PM         1     99.83      0.00      0.17      0.00      0.00      0.00

09:10:25 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:10:55 PM       all     99.32      0.00      0.68      0.00      0.00      0.00
09:10:55 PM         0     98.70      0.00      1.30      0.00      0.00      0.00
09:10:55 PM         1     99.90      0.00      0.10      0.00      0.00      0.00

09:10:55 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:11:25 PM       all     99.28      0.00      0.72      0.00      0.00      0.00
09:11:25 PM         0     98.70      0.00      1.30      0.00      0.00      0.00
09:11:25 PM         1     99.90      0.00      0.10      0.00      0.00      0.00

09:11:25 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:11:55 PM       all     99.27      0.00      0.73      0.00      0.00      0.00
09:11:55 PM         0     98.67      0.00      1.33      0.00      0.00      0.00
09:11:55 PM         1     99.87      0.00      0.13      0.00      0.00      0.00

09:11:55 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:12:25 PM       all     99.25      0.00      0.75      0.00      0.00      0.00
09:12:25 PM         0     98.60      0.00      1.40      0.00      0.00      0.00
09:12:25 PM         1     99.90      0.00      0.10      0.00      0.00      0.00

09:12:25 PM       CPU     %user     %nice   %system   %iowait    %steal     %idle
09:12:55 PM       all     99.32      0.00      0.68      0.00      0.00      0.00
09:12:55 PM         0     98.77      0.00      1.23      0.00      0.00      0.00
09:12:55 PM         1     99.90      0.00      0.10      0.00      0.00      0.00

Average:          CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:          all     99.27      0.00      0.73      0.00      0.00      0.00
Average:            0     98.67      0.00      1.33      0.00      0.00      0.00
Average:            1     99.88      0.00      0.12      0.00      0.00      0.00 

The final experiment is to add 3 additional CPU load processes to check if we can force the load average to go up any further now that we are already consuming all available CPU resources on the system.

$ load-gen cpu 3
Starting 3 CPU load processes.
$ top –bi –d30 –n7
top - 21:21:59 up 11:06,  1 user,  load average: 4.91, 3.47, 2.41
Tasks: 193 total,   6 running, 186 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.0%us,  0.7%sy,  0.3%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:   3074820k total,  2570552k used,   504268k free,   219180k buffers
Swap:  5144568k total,        0k used,  5144568k free,  1160512k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
27408 hbreeder  20   0 63836 1068  908 R 39.9  0.0   4:09.41 busy-cpu           
27377 hbreeder  20   0 63836 1064  908 R 39.8  0.0   8:42.65 busy-cpu           
27348 hbreeder  20   0 63836 1068  908 R 39.6  0.0  10:09.95 busy-cpu           
27477 hbreeder  20   0 63836 1064  908 R 39.4  0.0   1:11.19 busy-cpu           
27436 hbreeder  20   0 63836 1064  908 R 38.9  0.0   2:39.25 busy-cpu           
27483 hbreeder  20   0 12756 1192  836 R  0.1  0.0   0:00.13 top                

We managed to drive the load average up to 5 ;-) Because there are only 2 processors available in the system and there are 5 processes fighting for CPU time, each process will only get 40% from the available 200% CPU time.

Conclusion

Based on all these experiments we can conclude that CPU utilization is clearly influencing the load average of a Linux system. If the load average is above the total number of processors in the system we could conclude that the system is overloaded but this assumes that nothing else influences the load average. Is CPU utilization indeed the only factor that drives the Linux load average? Stay tuned for part two!
-Harald

About these ads

23 Responses to “Understanding Linux Load Average – Part 1”

  1. Reblogged this on Raheel's Blog.

  2. doktersil said

    Clear aticle Harald! I like it very much. Enough information for a DBA without complicating things! Cheers, Simone Pedroso

  3. djeday84 said

    can u provide scripts to reproduce load ?

    • Harald van Breederode said

      Hi,

      The CPU load script contains the following 4 lines of code:
      while :
      do
      :
      done

      • Hi Harald,

        Good topic. One little nit, though. That can’t be your script because ; (comma) is not a shell builtin…or do you have a script or a.out in your path called “;” :-)

      • Harald van Breederode said

        Hi Kevin,

        You are absulutly right! I made a typo while entering the comment. The ; should be a :. I’ve corrected it right away.
        -Harald

  4. Narendra said

    Herald,

    Excellent. Clear yet simple for somebody like me who is new to Linux to understand.
    Can’t wait for subsequent parts.
    Thanks in advance

  5. […] is the meaning of load average and when is it too high? Harald van Breederode has the […]

  6. Amir Hameed said

    It seems that the CPU run-queue is reported differently on Linux than on Solaris. If I am interpreting it correctly, on Linux, run-queue shows the actually number of running processes. On Solaris, run-queue shows the number of processes that are not running yet but are waiting to be put on the CPU. I ran the same test, as shown above, on my Solaris server and the run-queue starts to show a value of greater than zero when the number of load processes increased the number of CPUs on the server.

    • Harald van Breederode said

      Hi Amir,

      Yes, that is correct. On Linux the run queue shows the number of running (and waiting) processes. I haven’t verified your statement about Solaris but I believe this is indeed true. I haven’t looked at Solaris for a very long time ;-) But if my memory serves me correct interpreting load average on Solaris is quite different.
      -Harald

  7. dincer salih kurnaz said

    Hi

    I use to measure the load atop and iostat

    Dinçer

  8. Tom Bouwman said

    Here is another good article on load of Linux systems:

    http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages

    • Harald van Breederode said

      Hi Tom,

      Thanx for the pointer to another load average article. However that article states that the load average is only affected by CPU utilization. Which is clearly not the case. This is a common misunderstanding hence my postings on this subject.
      -Harald

  9. I don’t understand why peoples keep using such a broken indicator. It add values that count on unit (number on process waiting for cores) and values that can count in tens or hundreds (number of process waiting for IO). So a value of 10 can be on the same machine a non issue (10 IO waiting for disks) or a big load (10 process wainting for CPU).
    It was designed for computers with one core and one IDE disk, both components being mono-tasked. Those times are gone by now.

    • Harald van Breederode said

      Hi Fabrice,

      Thanx for sharing your opinion. Besides telling me what I shouldn’t use, you may as well tell me what to use instead.
      -Harald

      • vmstat/iostat are usefuls tools. There is a lot of values in /proc too. There is just two of them than one should never use because they where haven’t be upgraded from IDE and mono core computers :
        -load average.
        -svctm in iostat -x.
        But they are wrong only on linux. BSD/Solaris make them right.

  10. […] Understanding Linux Load Average – Part 1 […]

  11. dong ma said

    hi~i come from china.read your this serious of blog make my mind so clear.but,why don’t u upload your shell scripts?aha,hope u see it

  12. […] Unix Load Average […]

  13. […] Understanding Linux Load Average 谢谢 @jametong 参考:part1 part2 […]

  14. As per your blog no matter how many CPUs you have load is same ? ie Adding “1 CPU load process” will increase load by one on dual core server or 64 core server ?
    1 cpu 1 process = load 1 ,
    2 cpu 2 processes = load 2,
    3 cpu 6 process = load 3

    But,
    This blog says it the other way http://blog.scoutapp.com/articles/2009/07/31/understanding-load-averages =>
    1 cpu 1 process = load 1 ,
    2 cpu 2 processes = load 1,
    3 cpu 6 process = load 2

    Am I mistaken ? Please advice

    • Harald van Breederode said

      Hi,

      I think you miss interpret both articles. I see no differences between both postings.
      The load average will increase with each running process, i.e. 1 proc = load 1; 2 procs = load 2; 6 procs = load 6, but in order to determine if your CPUs are fully utilized you need to divide the load by the number of CPUs.
      -Harald

  15. […]     Load Average表示在OS调度队列中等待的进程数。不像CPU,Load Average将因为任何一种资源的紧缺而增大(比如CPU、网络、磁盘、内存…)。更多细节请参考理解Linux的Load Average […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
Follow

Get every new post delivered to your Inbox.

Join 245 other followers

%d bloggers like this: