hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Update on hadoop-0.23
Date Fri, 30 Sep 2011 10:17:20 GMT
On 30/09/2011 06:27, Eric Baldeschwieler wrote:
> Hi Doug, Jeff, Roman
>
> Let me rephrase my point.  I'd like to request that folks take bigtop project discussions
onto the bigtop lists and don't greet status reports on general@hadoop with insinuations that
folks who are working really hard on this project should be contributing different things
to another project or are somehow misbehaving by testing on their own infrastructure with
their own users.  Any kind of testing is a gift to the community and adds value.  You are
all welcome to contribute too.  If you find issues, then file JIRAs and work on the appropriate
project lists.  I believe that observing these points of etiquette will help this project
continue to prosper.

Bigtop is an attempt to have a coherent test & release process, with 
full stack testing, release artifacts tested on a set of platforms, and 
a codebase that has matured out of cloudera. I don't care about origin, 
all I want is consistent releases of compatible artifacts -and the 
testing to back up the claims of compatibility. The artifacts should be 
those things people install -RPMs, debs- ideally the tests should start 
of small clusters, then scale up to production size before release.

there are things happening in the hadoop core that mimic some of the 
features here -RPMs- but appear to be lacking the full stack functional 
testing which is a goal of bigtop.

>
> I agree with you that the Hadoop project is healthy.

How do you define health in this context?

1. There is a 0.20.20x branch that is the one people use in production 
-the stable one. The API is behind the 0.21+ feature set, and so is less 
convenient to code against. It picks up features as well as fixes, which 
I find troublesome. You don't see new features going into RHEL5.x, 
Ubuntu LTS releases. Yes, I know users like those features, but it could 
be due to a slow release of new versions that they trust to work and 
preserve data. It's healthy, but the backport of features creates inertia.

2. there is the 0.23 branch that everyone -especially Arun- is working 
on, which is really promising, though some of the features (federation, 
YARN) are going to be fairly traumatic in rollout. That doesn't mean 
they are good, only that switching to them will have surprises.

3. There's 0.22 which is going to combine the API of 0.21 with the fixes 
of 0.20.20x *and* will be the last release of the MR1.0 engine. For that 
last reason, I think there's value in pushing it out, though it's going 
to take time, and there's a risk of it adding another branch to be 
maintained for an indeterminate period.

4. There are the third party "compatible" projects, CDH, MapR, EMC HD, 
Amazon Elastic MR, which are all declaring compatibility with 0.20.x; no 
stated plans when/how to move to 0.23+

I would say Hadoop is incredibly successful -it's generating lots of 
interest, is being used by big companies, it has almost singlehandedly 
revitalised server-side Java dev, it is the foundation for an OSS 
version of the MS Azure stack. But for that latter goal to be achieved 
-it's what I want- we need to move forward on releases where the entire 
stack is consistent, releases that people want to use.

For that consistency, I'd like bigtop to be a subject people can talk 
about here, just as MRUnit, which will be needed now that 0.23+ removes 
the MiniMRCluster feature.

-steve

Mime
View raw message