Why does my Linux virtual machine lose time?
Posted by Harald van Breederode on February 8, 2009
Despite the fact that I am a real Unix adept I run Windows XP on my laptop because it is much better accessible to the blind due to the greater availability of screen readers which I found to be more feature rich than Unix screen readers (if they exist at all). But this doesn’t mean I can’t run Linux as well on my laptop, as a matter of fact I prefer to run my Oracle demos on a VMware Server virtual machine running Oracle Enterprise Linux. Which in turn runs my Oracle10g and Oracle11g databases. The problem I encountered with this setup is that my Linux virtual machines are losing time and the clock falls behind rather quickly. After conducting a bit of research I discovered that this is caused by the fact that the default Linux kernel runs at a 1000Hz internal clock frequency and that VMware is unable to deliver the clock interrupts on time without losing them. This means that some clock interrupts are lost without notice to the Linux kernels which assumes each interrupt marks 1/1000th of a second. So each clock interrupt that gets lost makes the clock fall behind a 1/1000th of a second. Although you can let VMware synchronize the guest O/S clock to the host O/S clock I don’t recommend this because it makes your Linux clock very bumpy. What I understand is that if you enable this clock synchronization VMware will set the Linux clock every minute equal to the host clock. This means that if my clock falls 3 seconds behind every minute the clock will jump forward 3 seconds each time VMware does its synchronization thing. You can imagine what this means to the Oracle database instrumentation. I also tried to keep the clock synchronized using the Network Time Protocol (NTP) but it didn’t work because the time loss is to unpredictable and NTP gave up. Everything else I tried didn’t solve this problem. The solution is to recompile the Linux kernel with a 100Hz internal kernel frequency.
Recompiling the Linux kernel
Note: The following procedure is only applicable to Oracle Enterprise Linux 5. If there is enough demand I will explain the procedure for Oracle Enterprise Linux 4 in a future posting.
To recompile the Linux kernel I first need to know which kernel I am running and second I need to get the kernel source code for that kernel. I can get the kernel release with the uname command as shown below:
# uname -r 2.6.18-128.0.0.0.2.el5
Next I can download the kernel source code from the Oracle Open Source website. In my case I need to download the kernel-2.6.18-128.0.0.0.2.el5.src.rpm
file. Once downloaded I can install this kernel source RPM with the rpm command as follows:
# rpm -i kernel-2.6.18-128.0.0.0.2.el5.src.rpm
Note: The ‘#’ prompt indicates that I ran this as the root user. Also There will be warnings which can be ignored.
The kernel sources are now installed in the /usr/src/redhat/SOURCES
directory, and in /usr/src/redhat/SPECS
is a so called SPEC file installed which will be used to build the kernel rpm. Before recompiling the kernel I first need to change the internal clock frequency from 1000Hz to 100Hz. This is done by changing a setting in a configuration file. The name of this configuration file is hardware architecture dependant so I first need to get my machine type with the uname command as follows:
# uname -m i686
The configuration file is located in /usr/src/redhat/SOURCES
and the name is kernel-2.6.18-i686.config
. In this file I need to change the line with CONFIG_HZ_1000=y
into CONFIG_HZ_100=y
and I am ready to compile the kernel with the rpmbuild command given the SPEC file as its input as shown below:
# cd /u01/redhat/SPECS # rpmbuild --target=i686 -bp kernel-2.6.spec
This will run for an hour or more generating lots of output. Once finished the compiled kernel RPM is in /usr/src/redhat/RPMS/i686
with the name kernel-2.6.18-128.0.0.0.2.el5.i686.rpm
waiting to get installed.
Installing the new kernel
The new kernel can be installed with the rpm command, but the same kernel is currently running so a reboot with a different kernel is required before continuing. After the reboot I recommend removing the current installed kernel with rpm before installing the newly compiled one as follows:
# rpm -e kernel-2.6.18-128.0.0.0.2.el5 # rpm -i kernel-2.6.18-128.0.0.0.2.el5.i686.rpm
Note: It is possible that the kernel remove fails due to dependencies which have to be removed before the kernel is removed, and reinstalled afterwards.
A final reboot is required and after setting the clock it should never run behind anymore, but setting up NTP is still a wise thing to do.
Warning: Recompiling (and installing) the Linux kernel yourself makes your environment unsupported by Oracle and should never be done on a production environment.
-Harald
Ilmar Kerm said
In VMWare KB there is an article about timekeeping in Linux.
No kernel recompilation is required.
http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006427
Paul van Eldijk said
@ Ilmar:
Have you tried VMWare’s solutions? I’ve never gotten them to work, but OK, YMMV.
@ Harald:
It seems that more recent Linux-releases (e.g. RH-ES 5, CentOS 5) actually do work properly under VMWare
regards,
Paul
Ilmar Kerm said
Yes, I have used these parameters with RHEL 3, 4 and 5 and they work. But I use only VMWare Infrastructure, not VMWare Server.
Adrian Hollay said
Hello,
for Suse SLES running 32-bit and 64-bit kernels I am using the “clock=pit” kernel setting and just crontab to set the clock every minute using ntpdate:
* * * * * /usr/sbin/ntpdate 192.168.1.1 2&>1 >> /var/log/ntpdate.log
The offsets are very small:
…
12 Feb 12:44:06 ntpdate[24789]: adjust time server 192.168.1.1 offset 0.000422 sec
12 Feb 12:45:06 ntpdate[26079]: adjust time server 192.168.1.1 offset -0.000209 sec
12 Feb 12:46:06 ntpdate[27386]: adjust time server 192.168.1.1 offset 0.000134 sec
12 Feb 12:47:06 ntpdate[28688]: adjust time server 192.168.1.1 offset 0.007115 sec
12 Feb 12:48:06 ntpdate[29965]: adjust time server 192.168.1.1 offset -0.010231 sec
12 Feb 12:49:06 ntpdate[31267]: adjust time server 192.168.1.1 offset 0.005129 sec
…
This config works for me couple of years, already, and it is also suitable for running Oracle Clusterware on it.
Adrian Hollay
Harald van Breederode said
Hi Ilmar,
Many many thanks for pointing me to this KB article. Yesterday I installed a 1000Hz kernel in both a EL4 and EL5 VMware Server VM and they both run on time with the mentioned kernel parameters, even without NTP. This saves me quite some kernel compilations.
If I knew this a week ago I didn’t have to write this posting ;-)
-Harald
Timekeeping in VMware… o my… « SE Stuff and the like… said
[…] then me Marco Gralike • Prutser for breaking open the kernel discussion, good article there. https://prutser.wordpress.com/2009/02/08/why-does-my-linux-virtual-machine-lose-time/ • VMware for maintaining there KB so well • You for taking the time to read this […]
dikkiedick said
Harald,
I noticed on the training you gave last week that the time on your VMWare-servers was getting further and further behind on your Windowslaptop-time. Why was that then? As you’ve found a solution for the problem.
Greetings, Dick
Harald van Breederode said
Hi Dick,
I recently installed a new kernel and somehow the extra kernel arguments “clock=pmtmr divider=10” from /etc/grub.conf were lost in this process. I re-added them and time keeping is back to normal.
-Harald