cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcus Sorensen <shadow...@gmail.com>
Subject Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492?
Date Wed, 22 May 2013 16:56:54 GMT
If this were creating a new bug, for example "oh, your VPCs won't work
anymore for this release", or "here's a new UI, but it's really buggy
and barely functional" then I'd agree with this train of thought.
Instead, we are saying "we recently found out that since 2.2.x
cloudstack has had this behavior, and it will be fixed in 4.2"*.
That's a totally different thing. If 4.1 ends up being a poor quality
release that everyone remembers compared to others, it's not going to
be because we didn't address something that has been around for
several releases, that nobody has noticed.

* Assuming we verify that it's not a regression, which I'm still very
interested in knowing

On Wed, May 22, 2013 at 9:51 AM, John Burwell <jburwell@basho.com> wrote:
> Marcus,
>
> I would say that the only thing for an open source project worse than not releasing is
releasing a poor quality release.  A late release with high quality is soon forgotten.  An
on-time or late release with poor quality lingers in folks memory. The KDE project made the
near fatal mistake of following the same logic when they release 4.0, and the reputation of
KDE 4.x continues to suffer from it to this day.  CloudStack is trusted to run at the core
our user's operations.  In my view, if we err, we should err on the side of quality to avoid
of erosion of that trust.  If we ever lost that trust, our new features would never be evaluated.

>
> Thanks,
> -John
>
> On May 22, 2013, at 11:18 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>
>> Thanks for the response. Time sync is certainly an issue, I think one
>> of the things we are trying to gauge is whether the system vm
>> functionality has been impacted by time sync such that anyone has
>> noticed or cared.  That's not to detract from the point that having
>> time sync is optimal, and affects a lot of things, but functionally,
>> back to my item #1, can we confirm that earlier versions have gotten
>> out of sync, and if so, do we have bug reports showing that it has
>> mattered?
>>
>>  To counter the argument, there are plenty of people looking for the
>> features in 4.1, that wouldn't choose cloudstack because it's not
>> released yet. Then there's the delay impact to 4.2, and keeping all of
>> those features out of the hands of people as well.
>>
>> For me, the fear is that we end up pushing 4.1 back to or near where
>> 4.2 would have been otherwise released, at which point we haven't
>> really accomplished anything but delayed the release of the working
>> features in 4.1.
>>
>>
>> On Wed, May 22, 2013 at 9:09 AM, John Burwell <jburwell@basho.com> wrote:
>>> Marcus,
>>>
>>> For me, S3 integration and Xen feature parity are not the primary reasons that
this defect should remain a blocker.  Time synchronization is a basic and essential assumption
for systems such as CloudStack.  This defect yields file and log timestamps from secondary
storage that are unreliable -- impacting customers in an accredited environment (e.g. SOX)
or that rely on those timestamps for any downstream operations.  It also stands as a significant
impediment to operational debugging.  Additionally, as others have pointed out, time drifts
also impact encryption, and possibly handshake operations between the systems VMs and management
server.  While I appreciate and fully support a time-based release cycle, there has to be
a quality threshold for any release.  Looking at it from an operations perspective, failure
to maintain time sync across components is unacceptable.   Assuming I used Xen, I ask myself,
"Would I deploy a 4.1.0 if the known issues list stated that the system VMs could not maintain
time sync?", and, without hesitation, I would answer, "No.", and follow it up quickly, "Oh
no, I hope the release I have in production doesn't have this problem."
>>>
>>> Thanks,
>>> -John
>>>
>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen <shadowsor@gmail.com> wrote:
>>>
>>>> I feel like we need to clarify what's at risk here. Not to disrespect
>>>> anyone's opinion, but I'm just not getting where this is being
>>>> considered a major feature.  I think the very idea of Xen not having
>>>> feature parity (regardless of the feature) is distasteful to a lot of
>>>> us, and it should be. But consider that we are already two months
>>>> behind on a four month release cycle, and it sounds like fixing this
>>>> could take a month (if no issues are found, two weeks to qual the new
>>>> template). We run a time-based release, not a feature-based release.
>>>> Not all features are expected to be fully functional to get out the
>>>> door. Isn't the correct option to just mark the feature experimental,
>>>> tell them to run the newer template at their risk if they want it?
>>>>
>>>> 1) We need to verify whether this bug has been around for a long time,
>>>> because it will tell us how much it really matters and thus whether or
>>>> not it's a blocker. This addresses the 'timestamp of logs" and other
>>>> issues not related to new features.
>>>>
>>>> 2) We need to reiterate exactly what features are being affected. The
>>>> original e-mail lists 'S3 integration' as the only feature affected.
>>>> As far as I understand it, the actual feature impacted is a 'secondary
>>>> storage sync', if you have multiple zones, multiple secondary
>>>> storages, this backs up and handles the copying of templates, etc so
>>>> you don't have to manually register them everywhere.
>>>>
>>>> I appreciate John's work for getting that secondary storage sync
>>>> feature in place. I really wish we would have noticed the issue
>>>> earlier on, then we may not be having this discussion. That said, no
>>>> disrespect intended toward John, I'm having a hard time understanding
>>>> how this is a feature worth holding up the release. It's not a new
>>>> primary or secondary storage type integration, and it's not a feature
>>>> where the admin is helpless to do it themselves. If VPC doesn't work,
>>>> the admin can't do anything about it. If this sync doesn't work, the
>>>> admin writes a script that copies their stuff everywhere.
>>>>
>>>> Please, if anyone considers this a major feature worth blocking on,
>>>> explain to us why. Are you willing to push back release of all of the
>>>> other new features, and push back the 4.2 features, to have this one
>>>> feature in June, or whenever 4.1 gets out?
>>>>
>>>>
>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen <runseb@gmail.com>
wrote:
>>>>> +1 on moving forward.
>>>>>
>>>>> On this issue and on the upgrade issue I have realized that we forgot
about our time based release philosophy.
>>>>>
>>>>> There will always be bugs in the software. If we know them we can acknowledge
them in release notes and get started quickly on the next releases.
>>>>>
>>>>> To keep it short, I am now of the opinion (and I know I am kind of switching
mind here), that we should release 4.1 asap and start working on the bug fix versions right
away.
>>>>>
>>>>> If we do release often, then folks stuck on a particular bug can expect
a quick turn around and fix of their problems.
>>>>>
>>>>> -sebastien
>>>>>
>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins <mathias.mullins@citrix.com>
wrote:
>>>>>
>>>>>> -1 on this.
>>>>>>
>>>>>> New features really should be across the board for the Hypervisors.
Part
>>>>>> of the thing that distinguishes ACS is it's support across Xen /
VMware /
>>>>>> KVM. Do we really want to start getting in the habit of pushing forward
>>>>>> new features that are not across the fully functional hypervisors?
>>>>>>
>>>>>> I agree with Outback this also will start to affect the Xen/XCP community
>>>>>> by basically setting them apart and out on what a lot of people see
as a
>>>>>> major feature.
>>>>>>
>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is
not
>>>>>> fully functional and not a major feature-set right now, I would be
+1 on
>>>>>> this.
>>>>>>
>>>>>> MHO
>>>>>> Matt
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 5/20/13 4:15 PM, "Chip Childers" <chip.childers@sungard.com>
wrote:
>>>>>>
>>>>>>> All,
>>>>>>>
>>>>>>> As discussed on another thread [1], we identified a bug
>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System
VMs
>>>>>>> are not configured to sync their time with either the host HV
or an NTP
>>>>>>> service.  That bug affects the system VMs for all three primary
HVs (KVM,
>>>>>>> Xen and vSphere).  Patches have been committed addressing vSphere
and
>>>>>>> KVM.  It appears that a correction for Xen would require the
re-build of
>>>>>>> a system VM image and a full round of regression testing that
image.
>>>>>>>
>>>>>>> Given that the discussion thread has not resulted in a consensus
on this
>>>>>>> issue, I unfortunately believe that the only path forward is
to call for
>>>>>>> a formal VOTE.
>>>>>>>
>>>>>>> Please respond with one of the following:
>>>>>>>
>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492
being
>>>>>>> resolved
>>>>>>> +0: don't care one way or the other
>>>>>>> -1: do *not* proceed with any further 4.1 release candidates
until
>>>>>>> CLOUDSTACK-2492 has been fully resolved
>>>>>>>
>>>>>>> -chip
>>>>>>>
>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb
>>>>>>
>>>>>
>>>
>

Mime
View raw message