Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9081CEB7A for ; Wed, 27 Feb 2013 02:53:25 +0000 (UTC) Received: (qmail 92102 invoked by uid 500); 27 Feb 2013 02:53:23 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 92008 invoked by uid 500); 27 Feb 2013 02:53:23 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 91999 invoked by uid 99); 27 Feb 2013 02:53:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 02:53:23 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of cembree@gmail.com designates 209.85.219.49 as permitted sender) Received: from [209.85.219.49] (HELO mail-oa0-f49.google.com) (209.85.219.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 02:53:16 +0000 Received: by mail-oa0-f49.google.com with SMTP id j6so187114oag.36 for ; Tue, 26 Feb 2013 18:52:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:reply-to:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=rQrLRL94RKTzceGF+2ke4UjdEbb/3/IAhvLWGRy81+E=; b=U0UxD/gKONvQrUWH+lfEN6/s9Fx0AI4A3tn1RVbRPer6AEtZEKnXRPx30U8fThmop2 Qj5VomHMlkQPusL2J9GSu2eeALq7RW7MupoTmpM+ZFbijKFJegtFuAz0qbQSmO7yNrPD Qbq5GIkETymQMQRyWUDTUKukLZJfmvpZDzMx9uH6O/Mz1Sd3f+XWURP+VhOLXvtF7YqC sllqw0po2GivW+5O/Oefewh55Np2h1OmGchs8ESkO9nKjKI4txEWH5sVInGCCEoJOlRp xW+l3ILXkvM2v3OZGajgt/CCPQsX6/MUEuVlHLVDo3L5EYsk/EUtGJK/Bnc1XpnYbOOx JYgA== MIME-Version: 1.0 X-Received: by 10.60.24.197 with SMTP id w5mr634062oef.6.1361933575098; Tue, 26 Feb 2013 18:52:55 -0800 (PST) Received: by 10.76.115.136 with HTTP; Tue, 26 Feb 2013 18:52:55 -0800 (PST) Reply-To: chris@embree.us In-Reply-To: References: Date: Tue, 26 Feb 2013 21:52:55 -0500 Message-ID: Subject: Re: [DISCUSS] stabilizing Hadoop releases wrt. downstream From: Chris Embree To: general@hadoop.apache.org Content-Type: multipart/alternative; boundary=e89a8ff1c30ef3982704d6abde54 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c30ef3982704d6abde54 Content-Type: text/plain; charset=ISO-8859-1 Hey Roman, I don't want to hi-jack your topic with tech talk if detracts from your primary purpose, so please re-direct me as you see fit. There are a lot of details but it seems that the primary problem is that changes that "break" other code are introduced without being specifically included. It's one thing to optimize a function and keep all of the functionality the same, no segregation should be required. If you're going to change functionality of a API Call or existing routine, it should be sectioned off and specifically included (we've got plenty of xml) until the old way is deprecated for 2 or 3 releases. I have some ideas but I want to make sure this is the forum. :) Thanks for all of your work. Chris On Tue, Feb 26, 2013 at 8:43 PM, Roman Shaposhnik wrote: > Hi! > > for the past couple of releases of Hadoop 2.X code line the issue > of integration between Hadoop and its downstream projects has > become quite a thorny issue. The poster child here is Oozie, where > every release of Hadoop 2.X seems to be breaking the compatibility > in various unpredictable ways. At times other components (such > as HBase for example) also seem to be affected. > > Now, to be extremely clear -- I'm NOT talking about the *latest* version > of Oozie working with the *latest* version of Hadoop, instead > my observations come from running previous *stable* releases > of Bigtop on top of Hadoop 2.X RCs. > > As many of you know Apache Bigtop aims at providing a single > platform for integration of Hadoop and Hadoop ecosystem projects. > As such we're uniquely positioned to track compatibility between > different Hadoop releases with regards to the downstream components > (things like Oozie, Pig, Hive, Mahout, etc.). Every single single RC > we've been pretty diligent at trying to provide integration-level feedback > on the quality of the upcoming release, but it seems that our efforts > don't quite suffice in Hadoop 2.X stabilizing. > > Of course, one could argue that while Hadoop 2.X code line was > designated 'alpha' expecting much in the way of perfect integration > and compatibility was NOT what the Hadoop community was > focusing on. I can appreciate that view, but what I'm interested in > is the future of Hadoop 2.X not its past. Hence, here's my question > to all of you as a Hadoop community at large: > > Do you guys think that the project have reached a point where integration > and compatibility issues should be prioritized really high on the list > of things that make or break each future release? > > The good news, is that Bigtop's charter is in big part *exactly* about > providing you with this kind of feedback. We can easily tell you when > Hadoop behavior, with regard to downstream components, changes > between a previous stable release and the new RC (or even branch/trunk). > What we can NOT do is submit patches for all the issues. We are simply > too small a project and we need your help with that. > > I truly believe that we owe it to the downstream projects, and in the > second half of this email I will try to convince you of that. > > We all know that integration projects are impossible to pull off > unless there's a general consensus between all of the projects involved > that they indeed need to work with each other. You can NOT force > that notion, but you can always try to influence. This relationship > goes both ways. > > Consider a question in front of the downstream communities > of whether or not to adopt Hadoop 2.X as the basis. To answer > that question each downstream project has to be reasonably > sure that their concerns will NOT fall on deaf ears and that > Hadoop developers are, essentially, 'ready' for them to pick > up Hadoop 2.X. I would argue that so far the Hadoop community > had gone out of its way to signal that 2.X codeline is NOT > ready for the downstream. > > I would argue that moving forward this is a really unfortunate > situation that may end up undermining the long term success > of Hadoop 2.X if we don't start addressing the problem. Think > about it -- 90% of unit tests that run downstream on Apache > infrastructure are still exercising Hadoop 1.X underneath. > In fact, if you were to forcefully make, lets say, HBase's > unit tests run on top of Hadoop 2.X quite a few of them > are going to fail. Hadoop community is, in effect, cutting > itself off from the biggest source of feedback -- its downstream > users. This in turn: > > * leaves Hadoop project in a perpetual state of broken > windows syndrome. > > * leaves Apache Hadoop 2.X releases in a state considerably > inferior to the releases *including* Apache Hadoop done by the > vendors. The users have no choice but to alight themselves > with vendor offerings if they wish to utilize latest Hadoop > functionality. > The artifact that is know as Apache Hadoop 2.X stopped being > a viable choice thus fracturing the user community and reducing > the benefits of a commonly deployed codebase. > > * leaves downstream projects of Hadoop in a jaded state where > they legitimately get very discouraged and frustrated and eventually > give up thinking that -- well, we work with one release of Hadoop > (the stable one Hadoop 1.X) and we shall wait for the Hadoop > community to get their act together. > > In my view (shared by quite a few members of the Apache Bigtop) we > can definitely do better than this if we all agree that the proposed > first 'beta' release of Hadoop 2.0.4 is the right time for it to happen. > > It is about time Hadoop 2.X community wins back all those end users > and downstream projects that got left behind during the alpha > stabilization phase. > > Thanks, > Roman. > --e89a8ff1c30ef3982704d6abde54--