cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Burwell <jburw...@basho.com>
Subject Re: [ACS41] System VMs not syncing time - does this block the release?
Date Wed, 15 May 2013 18:48:13 GMT
Chiradeep,

As I mentioned earlier, this issue is larger than S3-backed Secondary Storage.  It just happens
that this issue was surfaced by testing that feature.    Clock drift exceeding than a few
seconds can be operational issue (e.g. file timestamps, logging, etc).  A lack of reliable
clock sync will be an issue in any accredited environment (e.g. SOX) due to its impact on
the integrity on audit trails.  

Clock drift in virtualized environments can easily exceed 15 minutes in either direction.
  The drift behavior is driven by a number of environmental variables.  In my experience,
I have seen clocks drift by hours in less than 30 minutes of real time.  Therefore, I would
caution against taking a single environment as a general benchmark.  VMWare has a written
white paper on the subject [1] which goes into a great deal of depth around system clocks
in virtualized environments.

All of these things being said, it appears that the Xen behavior may be a regression that
can be addressed with a relatively straightforward fix (dropping the proper file in /proc/sys/xen),
and KVM has been fixed by Marcus.  Is anyone looking at a fix for VMWare?

Thanks,
-John

[1]: http://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf

On May 15, 2013, at 2:29 PM, Chiradeep Vittal <Chiradeep.Vittal@citrix.com> wrote:

> The previous ones were on XS 5.6 FP2
> This one's on XS 6.0.2
> r-275166-VM 18:22:10 up 8 days,
> domU: Wed May 15 18:22:10 UTC 2013
> dom0: Wed May 15 11:22:10 PDT 2013
> 
> 
> 
> On 5/15/13 11:22 AM, "Chiradeep Vittal" <Chiradeep.Vittal@citrix.com>
> wrote:
> 
>> The normal S3 time sync is 15 minutes. I can't imagine a drift of 15
>> minutes in a few days of operation? I logged into 3 system vms running on
>> Xen and saw this drift:
>> 
>> r-9-VM 17:52:37 up 29 days, 10:33,
>> domU: Wed May 15 17:52:37 UTC 2013
>> dom0: Wed May 15 10:52:37 PDT 2013
>> 
>> r-535-VM 18:13:46 up 43 days, 20:49,
>> domU: Wed May 15 18:13:46 UTC 2013
>> dom0: Wed May 15 11:14:47 PDT 2013
>> 
>> r-247793-VM 18:18:20 up 43 days, 20:53,
>> domU: Wed May 15 18:18:20 UTC 2013
>> dom0: Wed May 15 11:18:33 PDT 2013
>> 
>> 
>> A PV kernel such as the systemvm's automatically maintains the clock sync
>> with dom0.
>> 
>> 
>> On 5/15/13 10:30 AM, "John Burwell" <jburwell@basho.com> wrote:
>> 
>>> Chiradeep,
>>> 
>>> The issue I am experiencing is that the system VMs are not syncing to
>>> dom0
>>> on devcloud (i.e. the dom0 clock and the SSVM clock are different).  As I
>>> mentioned earlier in this thread, the syncing was working previously
>>> which
>>> seems to jibe with your findings.  What mechanism is used to sync the
>>> dom0
>>> and domU clocks (e.g. NTP, kernel driver, etc)?  It may be a situation
>>> where the pieces are present, but they aren't configured properly or
>>> simply
>>> not running.
>>> 
>>> As an aside, we can not run VirtualBox Additions on devcloud due a
>>> conflict
>>> with the Xen kernel.  Therefore, I hard execute "ntpdate pool.ntp.org"
>>> periodically on the devcloud host to keep the clock synced with the "real
>>> world".  Another approach is to configure NTP with a very large drift and
>>> increase check frequency to accomodate the large clock swings that can
>>> occur.
>>> 
>>> Thanks,
>>> -John
>>> 
>>> 
>>> On Wed, May 15, 2013 at 1:21 PM, Chiradeep Vittal <
>>> Chiradeep.Vittal@citrix.com> wrote:
>>> 
>>>> Perhaps this is a problem with DevCloud?
>>>> 
>>>> http://nerdboys.com/2011/03/15/how-to-fix-virtualbox-time-synchonization
>>>> -
>>>> pr
>>>> oblems/
>>>> 
>>>> 
>>>> 
>>>> On 5/15/13 10:17 AM, "Chiradeep Vittal" <Chiradeep.Vittal@citrix.com>
>>>> wrote:
>>>> 
>>>>> According to our resident Xen expert, any PV kernel automatically
>>>> syncs to
>>>>> the hardware clock on dom0.
>>>>> 
>>>>> On 5/15/13 9:50 AM, "John Burwell" <jburwell@basho.com> wrote:
>>>>> 
>>>>>> Marcus,
>>>>>> 
>>>>>> Agreed.  I think we need to add a set of hypervisor agnostic  time
>>>>>> keeping guidelines to the documentation.  I just wanted to make sure
>>>>>> there wasn't anything KVM specific that should be added as well.
>>>>>> 
>>>>>> Thanks,
>>>>>> -John
>>>>>> 
>>>>>> On May 15, 2013, at 12:48 PM, Marcus Sorensen <shadowsor@gmail.com>
>>>>>> wrote:
>>>>>> 
>>>>>>> Just the general one that system vms sync their time to the
>>>>>>> hypervisor, thus admins need to keep the hypervisor time correct.
>>>> It
>>>>>>> sounds like that will be the case for all three, if we can manage
>>>> it.
>>>>>>> 
>>>>>>> On Wed, May 15, 2013 at 10:44 AM, John Burwell <jburwell@basho.com>
>>>>>>> wrote:
>>>>>>>> Marcus,
>>>>>>>> 
>>>>>>>> Excellent.  So, it looks like we have KVM resolved.  We just
need
>>>> to
>>>>>>>> address Xen and VMWare now.  Do you think we need to any
guidance
>>>> to
>>>>>>>> the documentation regarding KVM time keeping (e.g. environmental
>>>>>>>> prerequisites)?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> -John
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On May 15, 2013, at 12:39 PM, Marcus Sorensen
>>>> <shadowsor@gmail.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> KVM LibvirtComputingResource has been patched in master.
Tested
>>>> on
>>>>>>>>> master, 4.1, and both the acton and current system vm
templates.
>>>> This
>>>>>>>>> patch makes system vms use 'kvmclock' for their timer,
which is a
>>>> vm
>>>>>>>>> driver that gets it's time from the hypervisor. No change
to the
>>>>>>>>> system vm template itself.
>>>>>>>>> 
>>>>>>>>> bfc5887a1bf6b41e88dd7a8f9987fcee8d3d9175
>>>>>>>>> 
>>>>>>>>> On Wed, May 15, 2013 at 9:08 AM, Chip Childers
>>>>>>>>> <chip.childers@sungard.com> wrote:
>>>>>>>>>> On Wed, May 15, 2013 at 11:03:16AM -0400, John Burwell
wrote:
>>>>>>>>>>> Chip,
>>>>>>>>>>> 
>>>>>>>>>>> One other item I neglected to mention was that
clock sync, at
>>>> least
>>>>>>>>>>> for Xen system VMs, wasn't an issue in the Jan-Feb
timeframe.
>>>>>>>>>>> Previously when I encountered these issues, syncing
the host's
>>>> clock
>>>>>>>>>>> and rebuilding the system VMs addressed the issue.
 I assumed,
>>>> but
>>>>>>>>>>> never verified, that the SSVM was syncing back
against the
>>>> host's
>>>>>>>>>>> clock through hypervisor.  During my testing
yesterday, aside
>>>> from
>>>>>>>>>>> hard setting the clock, I was unable to force
clock sync on the
>>>>>>>>>>> SSVM.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> -John
>>>>>>>>>> 
>>>>>>>>>> I think that's our issue right now...  answering
the question:
>>>> Why
>>>>>>>>>> is
>>>>>>>>>> this only an issue now?  Did we just get lucky up
to this point?
>>>>>>>>>> Since
>>>>>>>>>> the SSVMs are the same template as the timeframe
you mention, I
>>>> tend
>>>>>>>>>> to
>>>>>>>>>> believe that you / we were just lucky.
>>>>>>>>>> 
>>>>>>>>>> Anyone else have thoughts?
>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> On May 15, 2013, at 10:18 AM, Chip Childers
>>>>>>>>>>> <chip.childers@sungard.com> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Starting a thread on this specific issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> CLOUDSTACK-2492 was opened, which is basically
the fact that
>>>> the
>>>>>>>>>>>> System
>>>>>>>>>>>> VMs aren't syncing time to the host or to
an NTP server.  The
>>>> S3
>>>>>>>>>>>> integration is broken because of this problem,
and therefore
>>>> could
>>>>>>>>>>>> not
>>>>>>>>>>>> be considered a function available in 4.1
if we release as is.
>>>>>>>>>>>> 
>>>>>>>>>>>> We need input from people that know about
the current system
>>>> VMs
>>>>>>>>>>>> (the
>>>>>>>>>>>> 3.x VMs), as well as the possibility of using
the newer ones
>>>> that
>>>>>>>>>>>> we
>>>>>>>>>>>> have been considering experimental for 4.1.0.
>>>>>>>>>>>> 
>>>>>>>>>>>> What should we do?
>>>>>>>>>>>> 
>>>>>>>>>>>> -chip
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>>> 
>> 
> 


Mime
View raw message