cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chiradeep Vittal <Chiradeep.Vit...@citrix.com>
Subject Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?
Date Wed, 22 May 2013 17:01:33 GMT
As the author of the original systemvm (and current contributor to the
systemvm), I can confidently state that this issue has been there since
2.2.0. 
The issue is that the Debian 2.6.32 kernel is a PVOPS kernel. All PVOPs
kernels require ntp to keep time sync.
http://www.gossamer-threads.com/lists/xen/users/234750

On 5/22/13 9:56 AM, "Marcus Sorensen" <shadowsor@gmail.com> wrote:

>If this were creating a new bug, for example "oh, your VPCs won't work
>anymore for this release", or "here's a new UI, but it's really buggy
>and barely functional" then I'd agree with this train of thought.
>Instead, we are saying "we recently found out that since 2.2.x
>cloudstack has had this behavior, and it will be fixed in 4.2"*.
>That's a totally different thing. If 4.1 ends up being a poor quality
>release that everyone remembers compared to others, it's not going to
>be because we didn't address something that has been around for
>several releases, that nobody has noticed.
>
>* Assuming we verify that it's not a regression, which I'm still very
>interested in knowing
>
>On Wed, May 22, 2013 at 9:51 AM, John Burwell <jburwell@basho.com> wrote:
>> Marcus,
>>
>> I would say that the only thing for an open source project worse than
>>not releasing is releasing a poor quality release.  A late release with
>>high quality is soon forgotten.  An on-time or late release with poor
>>quality lingers in folks memory. The KDE project made the near fatal
>>mistake of following the same logic when they release 4.0, and the
>>reputation of KDE 4.x continues to suffer from it to this day.
>>CloudStack is trusted to run at the core our user's operations.  In my
>>view, if we err, we should err on the side of quality to avoid of
>>erosion of that trust.  If we ever lost that trust, our new features
>>would never be evaluated.
>
>>
>> Thanks,
>> -John
>>
>> On May 22, 2013, at 11:18 AM, Marcus Sorensen <shadowsor@gmail.com>
>>wrote:
>>
>>> Thanks for the response. Time sync is certainly an issue, I think one
>>> of the things we are trying to gauge is whether the system vm
>>> functionality has been impacted by time sync such that anyone has
>>> noticed or cared.  That's not to detract from the point that having
>>> time sync is optimal, and affects a lot of things, but functionally,
>>> back to my item #1, can we confirm that earlier versions have gotten
>>> out of sync, and if so, do we have bug reports showing that it has
>>> mattered?
>>>
>>>  To counter the argument, there are plenty of people looking for the
>>> features in 4.1, that wouldn't choose cloudstack because it's not
>>> released yet. Then there's the delay impact to 4.2, and keeping all of
>>> those features out of the hands of people as well.
>>>
>>> For me, the fear is that we end up pushing 4.1 back to or near where
>>> 4.2 would have been otherwise released, at which point we haven't
>>> really accomplished anything but delayed the release of the working
>>> features in 4.1.
>>>
>>>
>>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jburwell@basho.com>
>>>wrote:
>>>> Marcus,
>>>>
>>>> For me, S3 integration and Xen feature parity are not the primary
>>>>reasons that this defect should remain a blocker.  Time
>>>>synchronization is a basic and essential assumption for systems such
>>>>as CloudStack.  This defect yields file and log timestamps from
>>>>secondary storage that are unreliable -- impacting customers in an
>>>>accredited environment (e.g. SOX) or that rely on those timestamps for
>>>>any downstream operations.  It also stands as a significant impediment
>>>>to operational debugging.  Additionally, as others have pointed out,
>>>>time drifts also impact encryption, and possibly handshake operations
>>>>between the systems VMs and management server.  While I appreciate and
>>>>fully support a time-based release cycle, there has to be a quality
>>>>threshold for any release.  Looking at it from an operations
>>>>perspective, failure to maintain time sync across components is
>>>>unacceptable.   Assuming I used Xen, I ask myself, "Would I deploy a
>>>>4.1.0 if the known issues list stated that the system VMs could not
>>>>maintain time sync?", and, without hesitation, I would answer, "No.",
>>>>and follow it up quickly, "Oh no, I hope the release I have in
>>>>production doesn't have this problem."
>>>>
>>>> Thanks,
>>>> -John
>>>>
>>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <shadowsor@gmail.com>
>>>>wrote:
>>>>
>>>>> I feel like we need to clarify what's at risk here. Not to disrespect
>>>>> anyone's opinion, but I'm just not getting where this is being
>>>>> considered a major feature.  I think the very idea of Xen not having
>>>>> feature parity (regardless of the feature) is distasteful to a lot of
>>>>> us, and it should be. But consider that we are already two months
>>>>> behind on a four month release cycle, and it sounds like fixing this
>>>>> could take a month (if no issues are found, two weeks to qual the new
>>>>> template). We run a time-based release, not a feature-based release.
>>>>> Not all features are expected to be fully functional to get out the
>>>>> door. Isn't the correct option to just mark the feature experimental,
>>>>> tell them to run the newer template at their risk if they want it?
>>>>>
>>>>> 1) We need to verify whether this bug has been around for a long
>>>>>time,
>>>>> because it will tell us how much it really matters and thus whether
>>>>>or
>>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>>>> issues not related to new features.
>>>>>
>>>>> 2) We need to reiterate exactly what features are being affected. The
>>>>> original e-mail lists 'S3 integration' as the only feature affected.
>>>>> As far as I understand it, the actual feature impacted is a
>>>>>'secondary
>>>>> storage sync', if you have multiple zones, multiple secondary
>>>>> storages, this backs up and handles the copying of templates, etc so
>>>>> you don't have to manually register them everywhere.
>>>>>
>>>>> I appreciate John's work for getting that secondary storage sync
>>>>> feature in place. I really wish we would have noticed the issue
>>>>> earlier on, then we may not be having this discussion. That said, no
>>>>> disrespect intended toward John, I'm having a hard time understanding
>>>>> how this is a feature worth holding up the release. It's not a new
>>>>> primary or secondary storage type integration, and it's not a feature
>>>>> where the admin is helpless to do it themselves. If VPC doesn't work,
>>>>> the admin can't do anything about it. If this sync doesn't work, the
>>>>> admin writes a script that copies their stuff everywhere.
>>>>>
>>>>> Please, if anyone considers this a major feature worth blocking on,
>>>>> explain to us why. Are you willing to push back release of all of the
>>>>> other new features, and push back the 4.2 features, to have this one
>>>>> feature in June, or whenever 4.1 gets out?
>>>>>
>>>>>
>>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen
>>>>><runseb@gmail.com> wrote:
>>>>>> +1 on moving forward.
>>>>>>
>>>>>> On this issue and on the upgrade issue I have realized that we
>>>>>>forgot about our time based release philosophy.
>>>>>>
>>>>>> There will always be bugs in the software. If we know them we can
>>>>>>acknowledge them in release notes and get started quickly on the
>>>>>>next releases.
>>>>>>
>>>>>> To keep it short, I am now of the opinion (and I know I am kind of
>>>>>>switching mind here), that we should release 4.1 asap and start
>>>>>>working on the bug fix versions right away.
>>>>>>
>>>>>> If we do release often, then folks stuck on a particular bug can
>>>>>>expect a quick turn around and fix of their problems.
>>>>>>
>>>>>> -sebastien
>>>>>>
>>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins
>>>>>><mathias.mullins@citrix.com> wrote:
>>>>>>
>>>>>>> -1 on this.
>>>>>>>
>>>>>>> New features really should be across the board for the
>>>>>>>Hypervisors. Part
>>>>>>> of the thing that distinguishes ACS is it's support across Xen
/
>>>>>>>VMware /
>>>>>>> KVM. Do we really want to start getting in the habit of pushing
>>>>>>>forward
>>>>>>> new features that are not across the fully functional hypervisors?
>>>>>>>
>>>>>>> I agree with Outback this also will start to affect the Xen/XCP
>>>>>>>community
>>>>>>> by basically setting them apart and out on what a lot of people
>>>>>>>see as a
>>>>>>> major feature.
>>>>>>>
>>>>>>> I think it sets a really bad precedent. If it was Hyper-V which
is
>>>>>>>not
>>>>>>> fully functional and not a major feature-set right now, I would
be
>>>>>>>+1 on
>>>>>>> this.
>>>>>>>
>>>>>>> MHO
>>>>>>> Matt
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <chip.childers@sungard.com>
>>>>>>>wrote:
>>>>>>>
>>>>>>>> All,
>>>>>>>>
>>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the
System
>>>>>>>>VMs
>>>>>>>> are not configured to sync their time with either the host
HV or
>>>>>>>>an NTP
>>>>>>>> service.  That bug affects the system VMs for all three primary
>>>>>>>>HVs (KVM,
>>>>>>>> Xen and vSphere).  Patches have been committed addressing
vSphere
>>>>>>>>and
>>>>>>>> KVM.  It appears that a correction for Xen would require
the
>>>>>>>>re-build of
>>>>>>>> a system VM image and a full round of regression testing
that
>>>>>>>>image.
>>>>>>>>
>>>>>>>> Given that the discussion thread has not resulted in a consensus
>>>>>>>>on this
>>>>>>>> issue, I unfortunately believe that the only path forward
is to
>>>>>>>>call for
>>>>>>>> a formal VOTE.
>>>>>>>>
>>>>>>>> Please respond with one of the following:
>>>>>>>>
>>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
>>>>>>>>being
>>>>>>>> resolved
>>>>>>>> +0: don't care one way or the other
>>>>>>>> -1: do *not* proceed with any further 4.1 release candidates
until
>>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>>
>>>>>>>> -chip
>>>>>>>>
>>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>>
>>>>>>
>>>>
>>


Mime
View raw message