Mailing-List: contact hdfs-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-dev@hadoop.apache.org
Received-SPF: softfail (nike.apache.org: transitioning domain of
 aw@altiscale.com does not designate 64.142.69.92 as permitted sender)
Content-Type: text/plain; charset=us-ascii
Mime-Version: 1.0 (Mac OS X Mail 6.6 \(1510\))
Subject: Re: Looking to a Hadoop 3 release
From: Allen Wittenauer <aw@altiscale.com>
In-Reply-To: 
 <CAGB5D2bMyCb5Q73Mm4UTtKFFUpqVer7LpdcVFw6SnB=WVC91XA@mail.gmail.com>
Date: Thu, 5 Mar 2015 21:21:56 -0800
Cc: "common-dev@hadoop.apache.org" <common-dev@hadoop.apache.org>,
        "mapreduce-dev@hadoop.apache.org" <mapreduce-dev@hadoop.apache.org>,
        "hdfs-dev@hadoop.apache.org" <hdfs-dev@hadoop.apache.org>
Content-Transfer-Encoding: quoted-printable
Message-Id: <85C98170-BDF9-454D-9211-7A320D1B8AB9@altiscale.com>
References: 
 <CAGB5D2Za9UHLjw1_+xNcJ0nSyWkJT2fXogCq_SUi6baFEv6+gQ@mail.gmail.com>
 <1425349807827.88706@hortonworks.com>
 <CAGB5D2a=5_46UcyKc+EoG08gYo=R-VDj+rnr7q9pjxj2kjuVNA@mail.gmail.com>
 <C48CA17E-D191-44BF-80E5-1F472CBF712E@hortonworks.com>
 <1425421960667.60647@hortonworks.com>
 <CAGB5D2Z+tWoagDO3i6Y5kEUPygHEip2MyRrLLB0Yh7vh+MCKGA@mail.gmail.com>
 <CFB90E86-24D5-4E49-90D1-B6B873E77B70@hortonworks.com>
 <CACDdcgdyPXqmn3W5tOGiv0fp2xkUz=5WoJzwXKZsZkeGP9UWSg@mail.gmail.com>
 <D11E0C57.13B5A%stevel@hortonworks.com>
 <D11E1B94.13C2F%stevel@hortonworks.com>
 <CAOapipsTzkX6WCHRR+sHSBd8FGkq4kOAveHys9knkgKCP_ODoQ@mail.gmail.com>
 <CAGB5D2bMyCb5Q73Mm4UTtKFFUpqVer7LpdcVFw6SnB=WVC91XA@mail.gmail.com>
To: yarn-dev@hadoop.apache.org


Is there going to be a general upgrade of dependencies?  I'm thinking of =
jetty & jackson in particular.

On Mar 5, 2015, at 5:24 PM, Andrew Wang <andrew.wang@cloudera.com> =
wrote:

> I've taken the liberty of adding a Hadoop 3 section to the Roadmap =
wiki
> page. In addition to the two things I've been pushing, I also looked
> through Allen's list (thanks Allen for making this) and picked out the
> shell script rewrite and the removal of HFTP as big changes. This =
would be
> the place to propose features for inclusion in 3.x, I'd particularly
> appreciate help on the YARN/MR side.
>=20
> Based on what I'm hearing, let me modulate my proposal to the =
following:
>=20
> - We avoid cutting branch-3, and release off of trunk. The trunk-only
> changes don't look that scary, so I think this is fine. This does mean =
we
> need to be more rigorous before merging branches to trunk. I think
> Vinod/Giri's work on getting test-patch.sh runs on non-trunk branches =
would
> be very helpful in this regard.
> - We do not include anything to break wire compatibility unless (as =
Jason
> says) it's an unbelievably awesome feature.
> - No harm in rolling alphas from trunk, as it doesn't lock us to =
anything
> compatibility wise. Downstreams like releases.
>=20
> I'll take Steve's advice about not locking GA to a given date, but I =
also
> share his belief that we can alpha/beta/GA faster than it took for =
Hadoop
> 2. Let's roll some intermediate releases, work on the roadmap items, =
and
> see how we're feeling in a few months.
>=20
> Best,
> Andrew
>=20
> On Thu, Mar 5, 2015 at 3:21 PM, Siddharth Seth <sseth@apache.org> =
wrote:
>=20
>> I think it'll be useful to have a discussion about what else people =
would
>> like to see in Hadoop 3.x - especially if the change is potentially
>> incompatible. Also, what we expect the release schedule to be for =
major
>> releases and what triggers them - JVM version, major features, the =
need for
>> incompatible changes ? Assuming major versions will not be released =
every 6
>> months/1 year (adoption time, fairly disruptive for downstream =
projects,
>> and users) -  considering additional features/incompatible changes =
for 3.x
>> would be useful.
>>=20
>> Some features that come to mind immediately would be
>> 1) enhancements to the RPC mechanics - specifically support for =
AsynRPC /
>> two way communication. There's a lot of places where we re-use =
heartbeats
>> to send more information than what would be done if the PRC layer =
supported
>> these features. Some of this can be done in a compatible manner to =
the
>> existing RPC sub-system. Others like 2 way communication probably =
cannot.
>> After this, having HDFS/YARN actually make use of these changes. The =
other
>> consideration is adoption of an alternate system ike gRpc which would =
be
>> incompatible.
>> 2) Simplification of configs - potentially separating client side =
configs
>> and those used by daemons. This is another source of perpetual =
confusion
>> for users.
>>=20
>> Thanks
>> - Sid
>>=20
>>=20
>> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran =
<stevel@hortonworks.com>
>> wrote:
>>=20
>>> Sorry, outlook dequoted Alejandros's comments.
>>>=20
>>> Let me try again with his comments in italic and proofreading of =
mine
>>>=20
>>> On 05/03/2015 13:59, "Steve Loughran" =
<stevel@hortonworks.com<mailto:
>>> stevel@hortonworks.com>> wrote:
>>>=20
>>>=20
>>>=20
>>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>>> tucu00@gmail.com><mailto:tucu00@gmail.com>> wrote:
>>>=20
>>> IMO, if part of the community wants to take on the responsibility =
and
>> work
>>> that takes to do a new major release, we should not discourage them =
from
>>> doing that.
>>>=20
>>> Having multiple major branches active is a standard practice.
>>>=20
>>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take =
a
>>> long time to get out, and during that time 0.21, 0.22, got released =
and
>>> ignored; 0.23 picked up and used in production.
>>>=20
>>> The 2.04-alpha release was more of a troublespot as it got picked up
>>> widely enough to be used in products, and changes were made between =
that
>>> alpha & 2.2 itself which raised compatibility issues.
>>>=20
>>> For 3.x I'd propose
>>>=20
>>>=20
>>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>>  2.  Make clear there are no guarantees of compatibility from =
alpha/beta
>>> releases to shipping. Best effort, but not to the extent that it =
gets in
>>> the way. More succinctly: we will care more about seamless migration =
from
>>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>>> accept policy (2). Hadoop's "instability guarantee" for the 3.x
>> alpha/beta
>>> phase
>>>=20
>>> As well as backwards compatibility, we need to think about Forwards
>>> compatibility, with the goal being:
>>>=20
>>> Any app written/shipped with the 3.x release binaries (JAR and =
native)
>>> will work in and against a 3.y Hadoop cluster, for all x, y in =
Natural
>>> where y>=3Dx  and is-release(x) and is-release(y)
>>>=20
>>> That's important, as it means all server-side changes in 3.x which =
are
>>> expected to to mandate client-side updates: protocols, HDFS erasure
>>> decoding, security features, must be considered complete and stable
>> before
>>> we can say is-release(x). In an ideal world, we'll even get the =
semantics
>>> right with tests to show this.
>>>=20
>>> Fixing classpath hell downstream is certainly one feature I am +1 =
on.
>> But:
>>> it's only one of the features, and given there's not any design doc =
on
>> that
>>> JIRA, way too immature to set a release schedule on. An alpha =
schedule
>> with
>>> no-guarantees and a regular alpha roll, could be viable, as new =
features
>> go
>>> in and can then be used to experimentally try this stuff in branches =
of
>>> Hbase (well volunteered, Stack!), etc. Of course instability =
guarantees
>>> will be transitive downstream.
>>>=20
>>>=20
>>> This time around we are not replacing the guts as we did from Hadoop =
1 to
>>> Hadoop 2, but superficial surgery to address issues were not =
considered
>> (or
>>> was too much to take on top of the guts transplant).
>>>=20
>>> For the split brain concern, we did a great of job maintaining =
Hadoop 1
>> and
>>> Hadoop 2 until Hadoop 1 faded away.
>>>=20
>>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>>> compatibility.
>>>=20
>>>=20
>>> Based on that experience I would say that the coexistence of Hadoop =
2 and
>>> Hadoop 3 will be much less demanding/traumatic.
>>>=20
>>> The re-layout of all the source trees was a major change there, =
assuming
>>> there's no refactoring or switch of build tools then picking things =
back
>>> will be tractable
>>>=20
>>>=20
>>> Also, to facilitate the coexistence we should limit Java language
>> features
>>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used
>> anymore
>>> we can remove this limitation.
>>>=20
>>> +1; setting javac.version will fix this
>>>=20
>>> What is nice about having java 8 as the base JVM is that it means =
you can
>>> be confident that all Hadoop 3 servers will be JDK8+, so downstream =
apps
>>> and libs can use all Java 8 features they want to.
>>>=20
>>> There's one policy change to consider there which is possibly, just
>>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>>> languages early, provided everyone recognised that "backport to =
branch-2"
>>> isn't going to happen.
>>>=20
>>> -Steve
>>>=20
>>>=20
>>=20