Return-Path: X-Original-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C29A217FB7 for ; Fri, 6 Mar 2015 05:28:57 +0000 (UTC) Received: (qmail 28988 invoked by uid 500); 6 Mar 2015 05:28:52 -0000 Delivered-To: apmail-hadoop-hdfs-dev-archive@hadoop.apache.org Received: (qmail 28523 invoked by uid 500); 6 Mar 2015 05:28:52 -0000 Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-dev@hadoop.apache.org Delivered-To: mailing list hdfs-dev@hadoop.apache.org Received: (qmail 28381 invoked by uid 99); 6 Mar 2015 05:28:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 05:28:51 +0000 X-ASF-Spam-Status: No, hits=1.0 required=5.0 tests=SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of aw@altiscale.com does not designate 64.142.69.92 as permitted sender) Received: from [64.142.69.92] (HELO mail.iobm.com) (64.142.69.92) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2015 05:28:26 +0000 Received: from dhcp-201.private.iobm.com (nat.iobm.com [64.142.69.92]) (authenticated bits=0) by mail.iobm.com (8.14.7/8.14.7) with ESMTP id t265Lxx5014333 (version=TLSv1/SSLv3 cipher=AES128-SHA bits=128 verify=NO); Thu, 5 Mar 2015 21:22:00 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\)) Subject: Re: Looking to a Hadoop 3 release From: Allen Wittenauer In-Reply-To: Date: Thu, 5 Mar 2015 21:21:56 -0800 Cc: "common-dev@hadoop.apache.org" , "mapreduce-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: <85C98170-BDF9-454D-9211-7A320D1B8AB9@altiscale.com> References: <1425349807827.88706@hortonworks.com> <1425421960667.60647@hortonworks.com> To: yarn-dev@hadoop.apache.org X-Mailer: Apple Mail (2.1510) X-Virus-Checked: Checked by ClamAV on apache.org Is there going to be a general upgrade of dependencies? I'm thinking of = jetty & jackson in particular. On Mar 5, 2015, at 5:24 PM, Andrew Wang = wrote: > I've taken the liberty of adding a Hadoop 3 section to the Roadmap = wiki > page. In addition to the two things I've been pushing, I also looked > through Allen's list (thanks Allen for making this) and picked out the > shell script rewrite and the removal of HFTP as big changes. This = would be > the place to propose features for inclusion in 3.x, I'd particularly > appreciate help on the YARN/MR side. >=20 > Based on what I'm hearing, let me modulate my proposal to the = following: >=20 > - We avoid cutting branch-3, and release off of trunk. The trunk-only > changes don't look that scary, so I think this is fine. This does mean = we > need to be more rigorous before merging branches to trunk. I think > Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches = would > be very helpful in this regard. > - We do not include anything to break wire compatibility unless (as = Jason > says) it's an unbelievably awesome feature. > - No harm in rolling alphas from trunk, as it doesn't lock us to = anything > compatibility wise. Downstreams like releases. >=20 > I'll take Steve's advice about not locking GA to a given date, but I = also > share his belief that we can alpha/beta/GA faster than it took for = Hadoop > 2. Let's roll some intermediate releases, work on the roadmap items, = and > see how we're feeling in a few months. >=20 > Best, > Andrew >=20 > On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth = wrote: >=20 >> I think it'll be useful to have a discussion about what else people = would >> like to see in Hadoop 3.x - especially if the change is potentially >> incompatible. Also, what we expect the release schedule to be for = major >> releases and what triggers them - JVM version, major features, the = need for >> incompatible changes ? Assuming major versions will not be released = every 6 >> months/1 year (adoption time, fairly disruptive for downstream = projects, >> and users) - considering additional features/incompatible changes = for 3.x >> would be useful. >>=20 >> Some features that come to mind immediately would be >> 1) enhancements to the RPC mechanics - specifically support for = AsynRPC / >> two way communication. There's a lot of places where we re-use = heartbeats >> to send more information than what would be done if the PRC layer = supported >> these features. Some of this can be done in a compatible manner to = the >> existing RPC sub-system. Others like 2 way communication probably = cannot. >> After this, having HDFS/YARN actually make use of these changes. The = other >> consideration is adoption of an alternate system ike gRpc which would = be >> incompatible. >> 2) Simplification of configs - potentially separating client side = configs >> and those used by daemons. This is another source of perpetual = confusion >> for users. >>=20 >> Thanks >> - Sid >>=20 >>=20 >> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran = >> wrote: >>=20 >>> Sorry, outlook dequoted Alejandros's comments. >>>=20 >>> Let me try again with his comments in italic and proofreading of = mine >>>=20 >>> On 05/03/2015 13:59, "Steve Loughran" = >> stevel@hortonworks.com>> wrote: >>>=20 >>>=20 >>>=20 >>> On 05/03/2015 13:05, "Alejandro Abdelnur" >> tucu00@gmail.com>> wrote: >>>=20 >>> IMO, if part of the community wants to take on the responsibility = and >> work >>> that takes to do a new major release, we should not discourage them = from >>> doing that. >>>=20 >>> Having multiple major branches active is a standard practice. >>>=20 >>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take = a >>> long time to get out, and during that time 0.21, 0.22, got released = and >>> ignored; 0.23 picked up and used in production. >>>=20 >>> The 2.04-alpha release was more of a troublespot as it got picked up >>> widely enough to be used in products, and changes were made between = that >>> alpha & 2.2 itself which raised compatibility issues. >>>=20 >>> For 3.x I'd propose >>>=20 >>>=20 >>> 1. Have less longevity of 3.x alpha/beta artifacts >>> 2. Make clear there are no guarantees of compatibility from = alpha/beta >>> releases to shipping. Best effort, but not to the extent that it = gets in >>> the way. More succinctly: we will care more about seamless migration = from >>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production. >>> 3. Anybody who ships code based on 3.x alpha/beta to recognise and >>> accept policy (2). Hadoop's "instability guarantee" for the 3.x >> alpha/beta >>> phase >>>=20 >>> As well as backwards compatibility, we need to think about Forwards >>> compatibility, with the goal being: >>>=20 >>> Any app written/shipped with the 3.x release binaries (JAR and = native) >>> will work in and against a 3.y Hadoop cluster, for all x, y in = Natural >>> where y>=3Dx and is-release(x) and is-release(y) >>>=20 >>> That's important, as it means all server-side changes in 3.x which = are >>> expected to to mandate client-side updates: protocols, HDFS erasure >>> decoding, security features, must be considered complete and stable >> before >>> we can say is-release(x). In an ideal world, we'll even get the = semantics >>> right with tests to show this. >>>=20 >>> Fixing classpath hell downstream is certainly one feature I am +1 = on. >> But: >>> it's only one of the features, and given there's not any design doc = on >> that >>> JIRA, way too immature to set a release schedule on. An alpha = schedule >> with >>> no-guarantees and a regular alpha roll, could be viable, as new = features >> go >>> in and can then be used to experimentally try this stuff in branches = of >>> Hbase (well volunteered, Stack!), etc. Of course instability = guarantees >>> will be transitive downstream. >>>=20 >>>=20 >>> This time around we are not replacing the guts as we did from Hadoop = 1 to >>> Hadoop 2, but superficial surgery to address issues were not = considered >> (or >>> was too much to take on top of the guts transplant). >>>=20 >>> For the split brain concern, we did a great of job maintaining = Hadoop 1 >> and >>> Hadoop 2 until Hadoop 1 faded away. >>>=20 >>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS >>> compatibility. >>>=20 >>>=20 >>> Based on that experience I would say that the coexistence of Hadoop = 2 and >>> Hadoop 3 will be much less demanding/traumatic. >>>=20 >>> The re-layout of all the source trees was a major change there, = assuming >>> there's no refactoring or switch of build tools then picking things = back >>> will be tractable >>>=20 >>>=20 >>> Also, to facilitate the coexistence we should limit Java language >> features >>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used >> anymore >>> we can remove this limitation. >>>=20 >>> +1; setting javac.version will fix this >>>=20 >>> What is nice about having java 8 as the base JVM is that it means = you can >>> be confident that all Hadoop 3 servers will be JDK8+, so downstream = apps >>> and libs can use all Java 8 features they want to. >>>=20 >>> There's one policy change to consider there which is possibly, just >>> possibly, we could allow new modules in hadoop-tools to adopt Java 8 >>> languages early, provided everyone recognised that "backport to = branch-2" >>> isn't going to happen. >>>=20 >>> -Steve >>>=20 >>>=20 >>=20