Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4041EDFB0 for ; Wed, 22 May 2013 17:02:00 +0000 (UTC) Received: (qmail 47850 invoked by uid 500); 22 May 2013 17:02:00 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 47799 invoked by uid 500); 22 May 2013 17:02:00 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 47768 invoked by uid 99); 22 May 2013 17:02:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 17:02:00 +0000 X-ASF-Spam-Status: No, hits=-5.0 required=5.0 tests=RCVD_IN_DNSWL_HI,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Chiradeep.Vittal@citrix.com designates 66.165.176.89 as permitted sender) Received: from [66.165.176.89] (HELO SMTP.CITRIX.COM) (66.165.176.89) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 May 2013 17:01:56 +0000 X-IronPort-AV: E=Sophos;i="4.87,722,1363132800"; d="scan'208";a="26810082" Received: from sjcpex01cl03.citrite.net ([10.216.14.145]) by FTLPIPO01.CITRIX.COM with ESMTP/TLS/AES128-SHA; 22 May 2013 17:01:34 +0000 Received: from SJCPEX01CL01.citrite.net ([169.254.1.242]) by SJCPEX01CL03.citrite.net ([10.216.14.145]) with mapi id 14.02.0342.003; Wed, 22 May 2013 10:01:33 -0700 From: Chiradeep Vittal To: "dev@cloudstack.apache.org" Subject: Re: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492? Thread-Topic: [VOTE] Move forward with 4.1 without a Xen-specific fix for CLOUDSTACK-2492? Thread-Index: AQHOVZbeY9DY8vIWjUWHL2taRwZaaJkRPfyAgAAU1ACAAGqAgIAACYKAgAACnQCAAAkugIAAEkcA//+L8YA= Date: Wed, 22 May 2013 17:01:33 +0000 Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.3.2.130206 x-originating-ip: [10.216.48.12] Content-Type: text/plain; charset="us-ascii" Content-ID: <1DAB447BC521D747886D89F9EF31536E@citrix.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org As the author of the original systemvm (and current contributor to the systemvm), I can confidently state that this issue has been there since 2.2.0.=20 The issue is that the Debian 2.6.32 kernel is a PVOPS kernel. All PVOPs kernels require ntp to keep time sync. http://www.gossamer-threads.com/lists/xen/users/234750 On 5/22/13 9:56 AM, "Marcus Sorensen" wrote: >If this were creating a new bug, for example "oh, your VPCs won't work >anymore for this release", or "here's a new UI, but it's really buggy >and barely functional" then I'd agree with this train of thought. >Instead, we are saying "we recently found out that since 2.2.x >cloudstack has had this behavior, and it will be fixed in 4.2"*. >That's a totally different thing. If 4.1 ends up being a poor quality >release that everyone remembers compared to others, it's not going to >be because we didn't address something that has been around for >several releases, that nobody has noticed. > >* Assuming we verify that it's not a regression, which I'm still very >interested in knowing > >On Wed, May 22, 2013 at 9:51 AM, John Burwell wrote: >> Marcus, >> >> I would say that the only thing for an open source project worse than >>not releasing is releasing a poor quality release. A late release with >>high quality is soon forgotten. An on-time or late release with poor >>quality lingers in folks memory. The KDE project made the near fatal >>mistake of following the same logic when they release 4.0, and the >>reputation of KDE 4.x continues to suffer from it to this day. >>CloudStack is trusted to run at the core our user's operations. In my >>view, if we err, we should err on the side of quality to avoid of >>erosion of that trust. If we ever lost that trust, our new features >>would never be evaluated. > >> >> Thanks, >> -John >> >> On May 22, 2013, at 11:18 AM, Marcus Sorensen >>wrote: >> >>> Thanks for the response. Time sync is certainly an issue, I think one >>> of the things we are trying to gauge is whether the system vm >>> functionality has been impacted by time sync such that anyone has >>> noticed or cared. That's not to detract from the point that having >>> time sync is optimal, and affects a lot of things, but functionally, >>> back to my item #1, can we confirm that earlier versions have gotten >>> out of sync, and if so, do we have bug reports showing that it has >>> mattered? >>> >>> To counter the argument, there are plenty of people looking for the >>> features in 4.1, that wouldn't choose cloudstack because it's not >>> released yet. Then there's the delay impact to 4.2, and keeping all of >>> those features out of the hands of people as well. >>> >>> For me, the fear is that we end up pushing 4.1 back to or near where >>> 4.2 would have been otherwise released, at which point we haven't >>> really accomplished anything but delayed the release of the working >>> features in 4.1. >>> >>> >>> On Wed, May 22, 2013 at 9:09 AM, John Burwell >>>wrote: >>>> Marcus, >>>> >>>> For me, S3 integration and Xen feature parity are not the primary >>>>reasons that this defect should remain a blocker. Time >>>>synchronization is a basic and essential assumption for systems such >>>>as CloudStack. This defect yields file and log timestamps from >>>>secondary storage that are unreliable -- impacting customers in an >>>>accredited environment (e.g. SOX) or that rely on those timestamps for >>>>any downstream operations. It also stands as a significant impediment >>>>to operational debugging. Additionally, as others have pointed out, >>>>time drifts also impact encryption, and possibly handshake operations >>>>between the systems VMs and management server. While I appreciate and >>>>fully support a time-based release cycle, there has to be a quality >>>>threshold for any release. Looking at it from an operations >>>>perspective, failure to maintain time sync across components is >>>>unacceptable. Assuming I used Xen, I ask myself, "Would I deploy a >>>>4.1.0 if the known issues list stated that the system VMs could not >>>>maintain time sync?", and, without hesitation, I would answer, "No.", >>>>and follow it up quickly, "Oh no, I hope the release I have in >>>>production doesn't have this problem." >>>> >>>> Thanks, >>>> -John >>>> >>>> On May 22, 2013, at 10:35 AM, Marcus Sorensen >>>>wrote: >>>> >>>>> I feel like we need to clarify what's at risk here. Not to disrespect >>>>> anyone's opinion, but I'm just not getting where this is being >>>>> considered a major feature. I think the very idea of Xen not having >>>>> feature parity (regardless of the feature) is distasteful to a lot of >>>>> us, and it should be. But consider that we are already two months >>>>> behind on a four month release cycle, and it sounds like fixing this >>>>> could take a month (if no issues are found, two weeks to qual the new >>>>> template). We run a time-based release, not a feature-based release. >>>>> Not all features are expected to be fully functional to get out the >>>>> door. Isn't the correct option to just mark the feature experimental, >>>>> tell them to run the newer template at their risk if they want it? >>>>> >>>>> 1) We need to verify whether this bug has been around for a long >>>>>time, >>>>> because it will tell us how much it really matters and thus whether >>>>>or >>>>> not it's a blocker. This addresses the 'timestamp of logs" and other >>>>> issues not related to new features. >>>>> >>>>> 2) We need to reiterate exactly what features are being affected. The >>>>> original e-mail lists 'S3 integration' as the only feature affected. >>>>> As far as I understand it, the actual feature impacted is a >>>>>'secondary >>>>> storage sync', if you have multiple zones, multiple secondary >>>>> storages, this backs up and handles the copying of templates, etc so >>>>> you don't have to manually register them everywhere. >>>>> >>>>> I appreciate John's work for getting that secondary storage sync >>>>> feature in place. I really wish we would have noticed the issue >>>>> earlier on, then we may not be having this discussion. That said, no >>>>> disrespect intended toward John, I'm having a hard time understanding >>>>> how this is a feature worth holding up the release. It's not a new >>>>> primary or secondary storage type integration, and it's not a feature >>>>> where the admin is helpless to do it themselves. If VPC doesn't work, >>>>> the admin can't do anything about it. If this sync doesn't work, the >>>>> admin writes a script that copies their stuff everywhere. >>>>> >>>>> Please, if anyone considers this a major feature worth blocking on, >>>>> explain to us why. Are you willing to push back release of all of the >>>>> other new features, and push back the 4.2 features, to have this one >>>>> feature in June, or whenever 4.1 gets out? >>>>> >>>>> >>>>> On Wed, May 22, 2013 at 2:14 AM, Sebastien Goasguen >>>>> wrote: >>>>>> +1 on moving forward. >>>>>> >>>>>> On this issue and on the upgrade issue I have realized that we >>>>>>forgot about our time based release philosophy. >>>>>> >>>>>> There will always be bugs in the software. If we know them we can >>>>>>acknowledge them in release notes and get started quickly on the >>>>>>next releases. >>>>>> >>>>>> To keep it short, I am now of the opinion (and I know I am kind of >>>>>>switching mind here), that we should release 4.1 asap and start >>>>>>working on the bug fix versions right away. >>>>>> >>>>>> If we do release often, then folks stuck on a particular bug can >>>>>>expect a quick turn around and fix of their problems. >>>>>> >>>>>> -sebastien >>>>>> >>>>>> On May 22, 2013, at 2:59 AM, Mathias Mullins >>>>>> wrote: >>>>>> >>>>>>> -1 on this. >>>>>>> >>>>>>> New features really should be across the board for the >>>>>>>Hypervisors. Part >>>>>>> of the thing that distinguishes ACS is it's support across Xen / >>>>>>>VMware / >>>>>>> KVM. Do we really want to start getting in the habit of pushing >>>>>>>forward >>>>>>> new features that are not across the fully functional hypervisors? >>>>>>> >>>>>>> I agree with Outback this also will start to affect the Xen/XCP >>>>>>>community >>>>>>> by basically setting them apart and out on what a lot of people >>>>>>>see as a >>>>>>> major feature. >>>>>>> >>>>>>> I think it sets a really bad precedent. If it was Hyper-V which is >>>>>>>not >>>>>>> fully functional and not a major feature-set right now, I would be >>>>>>>+1 on >>>>>>> this. >>>>>>> >>>>>>> MHO >>>>>>> Matt >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 5/20/13 4:15 PM, "Chip Childers" >>>>>>>wrote: >>>>>>> >>>>>>>> All, >>>>>>>> >>>>>>>> As discussed on another thread [1], we identified a bug >>>>>>>> (CLOUDSTACK-2492) in the current 3.x system VMs, where the System >>>>>>>>VMs >>>>>>>> are not configured to sync their time with either the host HV or >>>>>>>>an NTP >>>>>>>> service. That bug affects the system VMs for all three primary >>>>>>>>HVs (KVM, >>>>>>>> Xen and vSphere). Patches have been committed addressing vSphere >>>>>>>>and >>>>>>>> KVM. It appears that a correction for Xen would require the >>>>>>>>re-build of >>>>>>>> a system VM image and a full round of regression testing that >>>>>>>>image. >>>>>>>> >>>>>>>> Given that the discussion thread has not resulted in a consensus >>>>>>>>on this >>>>>>>> issue, I unfortunately believe that the only path forward is to >>>>>>>>call for >>>>>>>> a formal VOTE. >>>>>>>> >>>>>>>> Please respond with one of the following: >>>>>>>> >>>>>>>> +1: proceed with 4.1 without the Xen portion of CLOUDSTACK-2492 >>>>>>>>being >>>>>>>> resolved >>>>>>>> +0: don't care one way or the other >>>>>>>> -1: do *not* proceed with any further 4.1 release candidates until >>>>>>>> CLOUDSTACK-2492 has been fully resolved >>>>>>>> >>>>>>>> -chip >>>>>>>> >>>>>>>> [1] http://markmail.org/message/rw7vciq3r33biasb >>>>>>> >>>>>> >>>> >>