hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinod Kumar Vavilapalli <vino...@hortonworks.com>
Subject Re: Looking to a Hadoop 3 release
Date Sat, 07 Mar 2015 00:19:35 GMT
Yes, these are the kind of enhancements that need to be proposed and discussed for inclusion!

Thanks,
+Vinod

On Mar 5, 2015, at 3:21 PM, Siddharth Seth <sseth@apache.org> wrote:


> Some features that come to mind immediately would be
> 1) enhancements to the RPC mechanics - specifically support for AsynRPC /
> two way communication. There's a lot of places where we re-use heartbeats
> to send more information than what would be done if the PRC layer supported
> these features. Some of this can be done in a compatible manner to the
> existing RPC sub-system. Others like 2 way communication probably cannot.
> After this, having HDFS/YARN actually make use of these changes. The other
> consideration is adoption of an alternate system ike gRpc which would be
> incompatible.
> 2) Simplification of configs - potentially separating client side configs
> and those used by daemons. This is another source of perpetual confusion
> for users.
> 
> Thanks
> - Sid
> 
> 
> On Thu, Mar 5, 2015 at 2:46 PM, Steve Loughran <stevel@hortonworks.com>
> wrote:
> 
>> Sorry, outlook dequoted Alejandros's comments.
>> 
>> Let me try again with his comments in italic and proofreading of mine
>> 
>> On 05/03/2015 13:59, "Steve Loughran" <stevel@hortonworks.com<mailto:
>> stevel@hortonworks.com>> wrote:
>> 
>> 
>> 
>> On 05/03/2015 13:05, "Alejandro Abdelnur" <tucu00@gmail.com<mailto:
>> tucu00@gmail.com><mailto:tucu00@gmail.com>> wrote:
>> 
>> IMO, if part of the community wants to take on the responsibility and work
>> that takes to do a new major release, we should not discourage them from
>> doing that.
>> 
>> Having multiple major branches active is a standard practice.
>> 
>> Looking @ 2.x, the major work (HDFS HA, YARN) meant that it did take a
>> long time to get out, and during that time 0.21, 0.22, got released and
>> ignored; 0.23 picked up and used in production.
>> 
>> The 2.04-alpha release was more of a troublespot as it got picked up
>> widely enough to be used in products, and changes were made between that
>> alpha & 2.2 itself which raised compatibility issues.
>> 
>> For 3.x I'd propose
>> 
>> 
>>  1.  Have less longevity of 3.x alpha/beta artifacts
>>  2.  Make clear there are no guarantees of compatibility from alpha/beta
>> releases to shipping. Best effort, but not to the extent that it gets in
>> the way. More succinctly: we will care more about seamless migration from
>> 2.2+ to 3.x than from a 3.0-alpha to 3.3 production.
>>  3.  Anybody who ships code based on 3.x alpha/beta to recognise and
>> accept policy (2). Hadoop's "instability guarantee" for the 3.x alpha/beta
>> phase
>> 
>> As well as backwards compatibility, we need to think about Forwards
>> compatibility, with the goal being:
>> 
>> Any app written/shipped with the 3.x release binaries (JAR and native)
>> will work in and against a 3.y Hadoop cluster, for all x, y in Natural
>> where y>=x  and is-release(x) and is-release(y)
>> 
>> That's important, as it means all server-side changes in 3.x which are
>> expected to to mandate client-side updates: protocols, HDFS erasure
>> decoding, security features, must be considered complete and stable before
>> we can say is-release(x). In an ideal world, we'll even get the semantics
>> right with tests to show this.
>> 
>> Fixing classpath hell downstream is certainly one feature I am +1 on. But:
>> it's only one of the features, and given there's not any design doc on that
>> JIRA, way too immature to set a release schedule on. An alpha schedule with
>> no-guarantees and a regular alpha roll, could be viable, as new features go
>> in and can then be used to experimentally try this stuff in branches of
>> Hbase (well volunteered, Stack!), etc. Of course instability guarantees
>> will be transitive downstream.
>> 
>> 
>> This time around we are not replacing the guts as we did from Hadoop 1 to
>> Hadoop 2, but superficial surgery to address issues were not considered (or
>> was too much to take on top of the guts transplant).
>> 
>> For the split brain concern, we did a great of job maintaining Hadoop 1 and
>> Hadoop 2 until Hadoop 1 faded away.
>> 
>> And a significant argument about 2.0.4-alpha to 2.2 protobuf/HDFS
>> compatibility.
>> 
>> 
>> Based on that experience I would say that the coexistence of Hadoop 2 and
>> Hadoop 3 will be much less demanding/traumatic.
>> 
>> The re-layout of all the source trees was a major change there, assuming
>> there's no refactoring or switch of build tools then picking things back
>> will be tractable
>> 
>> 
>> Also, to facilitate the coexistence we should limit Java language features
>> to Java 7 (even if the runtime is Java 8), once Java 7 is not used anymore
>> we can remove this limitation.
>> 
>> +1; setting javac.version will fix this
>> 
>> What is nice about having java 8 as the base JVM is that it means you can
>> be confident that all Hadoop 3 servers will be JDK8+, so downstream apps
>> and libs can use all Java 8 features they want to.
>> 
>> There's one policy change to consider there which is possibly, just
>> possibly, we could allow new modules in hadoop-tools to adopt Java 8
>> languages early, provided everyone recognised that "backport to branch-2"
>> isn't going to happen.
>> 
>> -Steve
>> 
>> 


Mime
View raw message