Return-Path: X-Original-To: apmail-hadoop-common-dev-archive@www.apache.org Delivered-To: apmail-hadoop-common-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 15C5B1750F for ; Tue, 10 Mar 2015 19:41:38 +0000 (UTC) Received: (qmail 97368 invoked by uid 500); 10 Mar 2015 19:41:22 -0000 Delivered-To: apmail-hadoop-common-dev-archive@hadoop.apache.org Received: (qmail 97215 invoked by uid 500); 10 Mar 2015 19:41:22 -0000 Mailing-List: contact common-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-dev@hadoop.apache.org Delivered-To: mailing list common-dev@hadoop.apache.org Received: (qmail 96692 invoked by uid 99); 10 Mar 2015 19:41:22 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2015 19:41:22 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kasha@cloudera.com designates 209.85.216.174 as permitted sender) Received: from [209.85.216.174] (HELO mail-qc0-f174.google.com) (209.85.216.174) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Mar 2015 19:40:57 +0000 Received: by qcwr17 with SMTP id r17so4792740qcw.2 for ; Tue, 10 Mar 2015 12:40:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type; bh=laP64O7qA2Eg/s5h3v6EFFyE8y2kKh5HWslCk8tNdM8=; b=fR9zE6WMyR+ed/ouw3pElFs8coylPCXL6GkoufXjj6ZmEo/V1FiyHkdWx1USXbPl8n 2gdiBoaKXLAZawkgoSUoidmfhb3EKYbyExcBe9f2fnGT/LkP5LpqLiGSVg+X6UlvW4RJ 0U+PhfRE4/19FPbLByUKMTg1r64RoyZV/RUASBNRGee3T5kXsm2+Ar1zwRc9FlV/OBo9 8oh+PkLXa9VqqGC1bKT7gVdo8MCl3R8x6i9l+yY+1Zul5EpRuUK1+kHoCWSGvqAvOD3J MblL37vnxf2Lastl+ldaRDzDLpiL4UFr5wGW1om5U5XdwXUnI/vqPM5DunYTmGSZsPzq 84dg== X-Gm-Message-State: ALoCoQmHHjDHV5HNuEND8Na3VdwXs78ZhYoDgiXLgNgCEGpalZxGUFNy8KuHDH/GO/iUA1Pyh+pd MIME-Version: 1.0 X-Received: by 10.55.42.37 with SMTP id q37mr54674226qkh.90.1426016455506; Tue, 10 Mar 2015 12:40:55 -0700 (PDT) Received: by 10.96.113.41 with HTTP; Tue, 10 Mar 2015 12:40:55 -0700 (PDT) In-Reply-To: References: Date: Tue, 10 Mar 2015 12:40:55 -0700 Message-ID: Subject: Re: Hadoop 3.x: what about shipping trunk as a 2.x release in 2015? From: Karthik Kambatla To: "yarn-dev@hadoop.apache.org" Cc: "mapreduce-dev@hadoop.apache.org" , "hdfs-dev@hadoop.apache.org" , "common-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a1149404046501c0510f454d0 X-Virus-Checked: Checked by ClamAV on apache.org --001a1149404046501c0510f454d0 Content-Type: text/plain; charset=UTF-8 On Mon, Mar 9, 2015 at 2:15 PM, Steve Loughran wrote: > > If 3.x is going to be Java 8 & not backwards compatible, I don't expect > anyone wanting to use this in production until some time deep into 2016. > > Issue: JDK 8 vs 7 > > It will require Hadoop clusters to move up to Java 8. While there's dev > pull for this, there's ops pull against this: people are still in the > moving-off Java 6 phase due to that "it's working, don't update it" > philosophy. Java 8 is compelling to us coders, but that doesn't mean ops > want it. > > You can run JDK-8 code in a YARN cluster running on Hadoop 2.7 *today*, > the main thing is setting up JAVA_HOME. That's something we could make > easier somehow (maybe some min Java version field in resource requests that > will let apps say java 8, java 9, ...). YARN could not only set up JVM > paths, it could fail-fast if a Java version wasn't available. > > What we can't do in hadoop coretoday is set javac.version=1.8 & use java > 8 code. Downstream code ca do that (Hive, etc); they just need to accept > that they don't get to play on JDK7 clusters if they embrace l-expressions. > > So...we need to stay on java 7 for some time due to ops pull; downstream > apps get to choose what they want. We can/could enhance YARN to make JVM > choice more declarative. > > Issue: Incompatible changes > > Without knowing what is proposed for "an incompatible classpath change", I > can't say whether this is something that could be made optional. If it > isn't, then it is a python-3 class option, "rewrite your code" event, which > is going to be particularly traumatic to things like Hive that already do > complex CP games. I'm currently against any mandatory change here, though > would love to see an optional one. And if optional, it ceases to become an > incompatible change... > We should probably start qualifying the word incompatible more often. Are we okay with an API incompatible Hadoop-3? No. Are we okay with an wire-incompatible Hadoop-3? Likely not. Are we okay with breaking other forms of compatibility for Hadoop-3, like behavior, dependencies, JDK, classpath, environment? I think so. Are we okay with breaking these forms of compatibility in future Hadoop-2.x? Likely not. Does our compatibility policy allow these changes in 2.x? Mostly yes, but that is because we don't have policies for a lot of these things that affect end-users. The reason we don't have a policy, IMO, is a combination of (1) we haven't spent enough time thinking about them, (2) without things like classpath isolation, we end up tying developers' hands if we don't let them change the dependencies. I propose we update our compat guidelines to be stricter, and do whatever is required to get there. Is it okay to change our compat guidelines incompatibly? May be, it warrants a Hadoop-3? I don't know yet. And, some other policies like bumping min JDK requirement are allowed in minor releases. Users might be okay with certain JDK bumps (6 to 7, since no one seems to be using 6 anymore), but users most definitely care about some other bumps (7 - 8). If we want to remove this subjective evaluation, I am open to requiring a major version for JDK upgrades (not support, but language features) even if it meant we have to wait until 3.0 for JDK upgrade. > > Issue: Getting trunk out the door > > The main diff from branch-2 and trunk is currently the bash script > changes. These don't break client apps. May or may not break bigtop & other > downstream hadoop stacks, but developers don't need to worry about this: > no recompilation necessary > > Proposed: ship trunk as a 2.x release, compatible with JDK7 & Java code. > > It seems to me that I could go > > git checkout trunk > mvn versions:set -DnewVersion=2.8.0-SNAPSHOT > > We'd then have a version of Hadoop-trunk we could ship later this year, > compatible at the JDK and API level with the existing java code & JDK7+ > clusters. > > A classpath fix that is optional/compatible can then go out on the 2.x > line, saving the 3.x tag for something that really breaks things, forces > all downstream apps to set up new hadoop profiles, have separate modules & > generally hate the hadoop dev team > > This lets us tick off the "recent trunk release" and "fixed shell scripts" > items, pushing out those benefits to people sooner rather than later, and > puts off the "Hello, we've just broken your code" event for another 12+ > months. > > Comments? > > -Steve > > > > -- Karthik Kambatla Software Engineer, Cloudera Inc. -------------------------------------------- http://five.sentenc.es --001a1149404046501c0510f454d0--