hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Arun C Murthy <...@yahoo-inc.com>
Subject [DISCUSS] Hadoop Security Release off Yahoo! patchset
Date Tue, 24 Aug 2010 00:27:14 GMT
Even with the work on hadoop-0.22 (trunk) starting in earnest it is  
fairly obvious, given our past history, that it will take a while for  
us to get it stable and deployable - for e.g. it took us nearly 6  
months to deploy hadoop-0.20.

In the interim I'd like to propose we push a hadoop-0.20-security  
release off the Yahoo! patchset (http://github.com/yahoo/hadoop- 
common). This will ensure the community benefits from all the work  
done at Yahoo! for over 12 months *now*, and ensures that we do not  
have to wait until hadoop-0.22 which has all of these patches.

Some salient aspects:
a) Full-fledged security implementation deployed at scale (4000 nodes)  
in production.
b) Lots of work on the stabilizing and optimizing the NameNode and  
JobTracker for over 12 months. This has been critical in deploying  
Hadoop at scale i.e. clusters of 4000 nodes. For e.g. we have a 50%  
improvement in CPU utilization on the JobTracker vis-a-vis the  
hadoop-0.20.2 release.
c) Several new features in the scheduler (CapacityScheduler), Map- 
Reduce framework, better support for multi-tenancy etc.
d) Several performance and stability improvements to the system e.g.  
iterative ls, robustness against rogue clients/jobs/users etc.

Also, given the huge number of features and enhancements I'd like to  
propose we create a new 0.20-security branch and commit the Yahoo  
patchset there for the release.

This has been proposed earlier by Doug and did not get far due to  
concerns about the effect this would have on development on trunk.  
However, I believe, we have a case for demonstrable progress on trunk  
now, and it would be useful to have an interim, fully-tested Apache  
Hadoop release available to the community.

  Conceivably, one could imagine a Hadoop Security + Append release  
soon after. At this point a Hadoop Security release alone would add  
tremendous value for the reasons above. Presently we would like to get  
this release out quickly to focus the majority of our efforts on trunk.

Thoughts?

Arun


Mime
View raw message