Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CB36AE73F for ; Thu, 7 Mar 2013 23:30:06 +0000 (UTC) Received: (qmail 9504 invoked by uid 500); 7 Mar 2013 23:30:04 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 9408 invoked by uid 500); 7 Mar 2013 23:30:04 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 9397 invoked by uid 99); 7 Mar 2013 23:30:04 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 23:30:04 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.213.49 as permitted sender) Received: from [209.85.213.49] (HELO mail-yh0-f49.google.com) (209.85.213.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Mar 2013 23:29:58 +0000 Received: by mail-yh0-f49.google.com with SMTP id m1so172176yhg.8 for ; Thu, 07 Mar 2013 15:29:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=YKMwlgn6wPJ4oyHOvXtzdL8iW6VdfX2wti0+Bo8IsD0=; b=kVtIwkTDBB8uE8AVptQoXFlKxnCE4nJH1F9xMyU+JudwMhLXRSG5qw0aqDAYnbIf8y TE/JxwHmraDRQKUDo+uYM7GqWcxNjgW8wUpK4NbFKhmgLVfpuhIZcp2brw+/3w/r0TF9 4tPJxC2QbvWTAN0pODw5lBRdO14+bcNQbFxtsGLC0yOlQWjITL4EbTGXlcdUXwgg2jOV BRtiUEbtsw+7TikKyBzCu6o5qRz4SDAj3Hq2d69FRUB6A0NEL/ok5EVkSSdl2yT5b87W 7octBLRu2niOxDqZh8Y1gHCODfwSkEoNJX8dxvKHkKDf5Cnb3zLcxV4/IEMmwcmVGr5c +7Uw== MIME-Version: 1.0 X-Received: by 10.236.134.163 with SMTP id s23mr109296yhi.54.1362698977937; Thu, 07 Mar 2013 15:29:37 -0800 (PST) Received: by 10.101.208.18 with HTTP; Thu, 7 Mar 2013 15:29:37 -0800 (PST) In-Reply-To: References: <20130305061505.GF32383@tpx> Date: Thu, 7 Mar 2013 15:29:37 -0800 Message-ID: Subject: Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream From: Ted Yu To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=20cf303a342d84117a04d75e14a2 X-Virus-Checked: Checked by ClamAV on apache.org --20cf303a342d84117a04d75e14a2 Content-Type: text/plain; charset=ISO-8859-1 Thanks Bobby. HBase trunk can build upon 2.0 SNAPSHOOT so that regression can be detected early. On Tue, Mar 5, 2013 at 7:18 AM, Robert Evans wrote: > That is a great point. I have been meaning to set up the Jenkins build > for branch-2 for a while, so I took the 10 mins and just did it. > > https://builds.apache.org/job/Hadoop-Common-2-Commit/ > > Don't let the name fool you, it publishes not just common, but HDFS, YARN, > MR, and tools too. You should now have branch-2 SNAPSHOTS updated on each > commit to branch-2. Feel free to bug me if you need more integration > points. I am not an RE guy, but I can hack it to make things work :) > > --Bobby > > On 3/5/13 12:15 AM, "Konstantin Boudnik" wrote: > > >Arun, > > > >first of all, I don't think anyone is trying to put a blame on someone > >else. E.g. I had similar experience with Oozie being broken because of > >certain released changes in the upstream. > > > >I am sure that most people in BigTop community - especially those who > >share the committer-ship privilege in BigTop and other upstream > >projects, including Hadoop, - would be happy to help with the > >stabilization of the Hadoop base. The issue that a downstream > >integration project is likely to have is - for once - the absence of > >regularly published development artifacts. In the light of "it didn't > >happen if there's no picture" here's a couple of examples: > > > > - 2.0.2-SNAPSHOT weren't published at all; only release 2.0.2-alpha > >artifacts were > > - 2.0.3-SNAPSHOT weren't published until Feb 29, 2013 (it happened just > >once) > > > >So, technically speaking, unless an integration project is willing to > >build and maintain its own artifacts, it is impossible to do any > >preventive validation. > > > >Which brings me to my next question: how do you guys address > >"Integration is high on the list of *every* release". Again, please > >don't get me wrong - I am not looking to lay a blame on or corner > >anyone - I am really curious and would appreciate the input. > > > > > >Vinod: > > > >> As you yourself noted later, the pain is part of the 'alpha' status > >> of the release. We are targeting +one of the immediate future > >> releases to be a beta and so these troubles are really only the > >> short +term. > > > >I don't really want to get into the discussion about of what > >constitutes the alpha and how it has delayed the adoption of Hadoop2 > >line. However, I want to point out that it is especially important for > >"alpha" platform to work nicely with downstream consumers of the said > >platform. For quite obvious reasons, I believe. > > > >> I think there is a fundamental problem with the interaction of > >> Bigtop with the downstream projects, if nothing else, with > > > >BigTop is as downstream as it can get, because BigTop essentially > >consumes all other component releases in order to produce a viable > >stack. Technicalities aside... > > > >> Hadoop. We never formalized on the process, will BigTop step in > >> after an RC is up for vote or before? As I see it, it's happening > > > >Bigtop essentially can give any component, including Hadoop, and > >better yet - the set of components - certain guaratees about > >compatibility and dependencies being included. Case in point is > >missing commons libraries missed in 1.0.1 release that essentially > >prevented HBase from working properly. > > > >> after the vote is up, so no wonder we are in this state. Shall we > >> have a pre-notice to Bigtop so that it can step in before? > > > >The above is in contradiction with earlier statement of "Integration > >is high on the list of *every* release". If BigTop isn't used for > >integration testing, then how said integration testing is performed? > >Is it some sort of test-patch process as Luke referred earlier? And > >why it leaves the room for the integration issues being uncaught? > >Again, I am genuinely interested to know. > > > >> these short term pains. I'd rather like us swim through these now > >> instead of support broken APIs and features in our beta, having seen > >> this very thing happen with 1.*. > > > >I think you're mixing the point of integration with downstream and > >being in an alpha phase of the development. The former isn't about > >supporting "broken APIs" - it is about being consistent and avoid > >breaking the downstream applicaitons without letting said applications > >to accomodate the platform changes first. > > > >Changes in the API, after all, can be relatively easy traced by > >integration validation - this is the whole point of integration > >testing. And BigTop does the job better then anything around, simply > >because there's nothing else around to do it. > > > >If you stay in shape-shifting "alpha" that doesn't integrate well for > >a very long time, you risk to lose downstream customers' interest, > >because they might get tired of waiting until a next stable API will > >be ready for them. > > > >> Let's fix the way the release related communication is happening > >> across our projects so that we can all work together and make 2.X a > >> success. > > > >This is a very good point indeed! Let's start a separate discussion > >thread on how we can improve the release model for coming Hadoop > >releases, where we - as the community - can provide better guarantees > >of the inter-component compatibility (sorry for an overused word). > > > >Cos > > > >On Fri, Mar 01, 2013 at 10:58AM, Arun C Murthy wrote: > >> I feel this is being blown out of proportion. > >> > >> Integration is high on the list of *every* release. In future, if > >>anyone or > >> bigtop wants to help, running integration tests on a hadoop RC and > >>providing > >> feedback would be very welcome. I'm pretty sure I will stop an RC if it > >> means it breaks and Oozie or HBase or Pig or Hive and re-spin it. For > >>e.g. > >> see recent efforts to do a 2.0.4-alpha. > >> > >> With hadoop-2.0.3-alpha we discovered 3 *bugs* - making it sound like we > >> intentionally disregard integation issues is very harsh. > >> > >> Please also see other thread where we discussed stabilizing APIS, > >>protocols > >> etc. for the next 'beta' release. > >> > >> Arun > >> > >> On Feb 26, 2013, at 5:43 PM, Roman Shaposhnik wrote: > >> > >> > Hi! > >> > > >> > for the past couple of releases of Hadoop 2.X code line the issue > >> > of integration between Hadoop and its downstream projects has > >> > become quite a thorny issue. The poster child here is Oozie, where > >> > every release of Hadoop 2.X seems to be breaking the compatibility > >> > in various unpredictable ways. At times other components (such > >> > as HBase for example) also seem to be affected. > >> > > >> > Now, to be extremely clear -- I'm NOT talking about the *latest* > >>version > >> > of Oozie working with the *latest* version of Hadoop, instead > >> > my observations come from running previous *stable* releases > >> > of Bigtop on top of Hadoop 2.X RCs. > >> > > >> > As many of you know Apache Bigtop aims at providing a single > >> > platform for integration of Hadoop and Hadoop ecosystem projects. > >> > As such we're uniquely positioned to track compatibility between > >> > different Hadoop releases with regards to the downstream components > >> > (things like Oozie, Pig, Hive, Mahout, etc.). Every single single RC > >> > we've been pretty diligent at trying to provide integration-level > >>feedback > >> > on the quality of the upcoming release, but it seems that our efforts > >> > don't quite suffice in Hadoop 2.X stabilizing. > >> > > >> > Of course, one could argue that while Hadoop 2.X code line was > >> > designated 'alpha' expecting much in the way of perfect integration > >> > and compatibility was NOT what the Hadoop community was > >> > focusing on. I can appreciate that view, but what I'm interested in > >> > is the future of Hadoop 2.X not its past. Hence, here's my question > >> > to all of you as a Hadoop community at large: > >> > > >> > Do you guys think that the project have reached a point where > >>integration > >> > and compatibility issues should be prioritized really high on the list > >> > of things that make or break each future release? > >> > > >> > The good news, is that Bigtop's charter is in big part *exactly* about > >> > providing you with this kind of feedback. We can easily tell you when > >> > Hadoop behavior, with regard to downstream components, changes > >> > between a previous stable release and the new RC (or even > >>branch/trunk). > >> > What we can NOT do is submit patches for all the issues. We are simply > >> > too small a project and we need your help with that. > >> > > >> > I truly believe that we owe it to the downstream projects, and in the > >> > second half of this email I will try to convince you of that. > >> > > >> > We all know that integration projects are impossible to pull off > >> > unless there's a general consensus between all of the projects > >>involved > >> > that they indeed need to work with each other. You can NOT force > >> > that notion, but you can always try to influence. This relationship > >> > goes both ways. > >> > > >> > Consider a question in front of the downstream communities > >> > of whether or not to adopt Hadoop 2.X as the basis. To answer > >> > that question each downstream project has to be reasonably > >> > sure that their concerns will NOT fall on deaf ears and that > >> > Hadoop developers are, essentially, 'ready' for them to pick > >> > up Hadoop 2.X. I would argue that so far the Hadoop community > >> > had gone out of its way to signal that 2.X codeline is NOT > >> > ready for the downstream. > >> > > >> > I would argue that moving forward this is a really unfortunate > >> > situation that may end up undermining the long term success > >> > of Hadoop 2.X if we don't start addressing the problem. Think > >> > about it -- 90% of unit tests that run downstream on Apache > >> > infrastructure are still exercising Hadoop 1.X underneath. > >> > In fact, if you were to forcefully make, lets say, HBase's > >> > unit tests run on top of Hadoop 2.X quite a few of them > >> > are going to fail. Hadoop community is, in effect, cutting > >> > itself off from the biggest source of feedback -- its downstream > >> > users. This in turn: > >> > > >> > * leaves Hadoop project in a perpetual state of broken > >> > windows syndrome. > >> > > >> > * leaves Apache Hadoop 2.X releases in a state considerably > >> > inferior to the releases *including* Apache Hadoop done by the > >> > vendors. The users have no choice but to alight themselves > >> > with vendor offerings if they wish to utilize latest Hadoop > >>functionality. > >> > The artifact that is know as Apache Hadoop 2.X stopped being > >> > a viable choice thus fracturing the user community and reducing > >> > the benefits of a commonly deployed codebase. > >> > > >> > * leaves downstream projects of Hadoop in a jaded state where > >> > they legitimately get very discouraged and frustrated and > >>eventually > >> > give up thinking that -- well, we work with one release of Hadoop > >> > (the stable one Hadoop 1.X) and we shall wait for the Hadoop > >> > community to get their act together. > >> > > >> > In my view (shared by quite a few members of the Apache Bigtop) we > >> > can definitely do better than this if we all agree that the proposed > >> > first 'beta' release of Hadoop 2.0.4 is the right time for it to > >>happen. > >> > > >> > It is about time Hadoop 2.X community wins back all those end users > >> > and downstream projects that got left behind during the alpha > >> > stabilization phase. > >> > > >> > Thanks, > >> > Roman. > >> > >> -- > >> Arun C. Murthy > >> Hortonworks Inc. > >> http://hortonworks.com/ > >> > >> > > --20cf303a342d84117a04d75e14a2--