hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@hortonworks.com>
Subject Re: [DISCUSS] Spin out MR, HDFS and YARN as their own TLPs and disband Hadoop umbrella project
Date Thu, 30 Aug 2012 11:00:50 GMT

On Aug 30, 2012, at 3:12 AM, Konstantin Shvachko wrote:

> 2. From technical (not community) viewpoint your "svn copy" is an ugly
> approach,
> as it creates a lot of code duplication and will result in a
> maintenance nightmare or / and
> will require many man-months to fix. My point is that you cannot
> neglect "technical issues" when you solve community problems.

Agreed Konstantin. I don't think Chris was being serious here - it was merely *one* way forward.


There are, easily, better ways to solve this.

The big cross-project dependency is IPC/RPC, Security and Metrics2. Some others are the network
topology apis etc. They need to be marked Public/Stable. We need to maintain compatibility
across a major (stable) release anyway. This is true for every other Public/Stable api. 

So, *technically*, the requirements are:
a) Ensure projects only use Public/Stable apis.
b) Maintain compatibility for Public/Stable apis within a major release.
c) Clearly key components like IPC, Metrics2, Secuirty etc. *should* be marked stable by the
time the ersatz hadoop-2 codebase is declared 'stable'.

None of these seem like the fashionably *scary* technical issues some people are using to
justify blocking the way forward.

And, no, YARN/MR aren't the only ones downstream projects in this mix - HBase for e.g. uses
hadoop metrics2 and our security apis. We need to support compatibility for HBase anyway.
There are several other projects in the same boat. Pig/Hive need FileSystem, Security &
MR apis. This is just *reality* being at the bottom of the stack.

Yes, there is work left - but that work is something we need to do with or without the split.

Furthermore, yes, the previous split/unsplit was painful. However, beyond that, we have made
progress across several dimensions which should make this one smoother:
a) Mavenization has helped a *lot*.
b) Unlike the previous attempt, HDFS2 & YARN (v/s HDFS1 & MR1) no longer share the
same run-time scripts etc. 
c) We have been fairly good at following through on our stability/visibility guarantees on
APIs.

As a result, I don't buy the *this is technically impossibleā€¢ argument.

As Konstantin suggested, we could spend the next few weeks/months preparing. 
Even after the split we would be in alpha/beta stage where-by we can recover from mistakes
at the cost of a few extra HDFS alpha/beta releases for the sake of MR/YARN projects which
seems like an acceptable cost given that there are several volunteers to RM releases.

Last, not least, the previous split failed because the overall community did not invest in
ensuring it's success. It's clearly *not* the case this time around. I'm very confident of
that.

Arun
Mime
View raw message