hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [DISCUSS] Hadoop Security Release off Yahoo! patchset
Date Fri, 14 Jan 2011 05:11:16 GMT
Hi Eli,

Thanks for the suggestion.

+1 to nigel and arun's proposal.

I completely support the idea of creating a version of 20 with append for HBASE.  However,
the append issue is very complicated and there does not exist any version of append that is
certified against a workload as diverse as what this branch has been tested against.  I think
you are trying to cross too many streams here.   If you have resources to help integrate any
version of Hadoop 0.20 with append, package and test it, I fully support you doing so.  But
that effort is not aligned with the goal of this branch, which is to share a substantial amount
of fully integrated and tested work.  Members of the community have expressed interest in
seeing this tested work get checked into Apache and I would like to share it.  Mashing it
up with other patches would invalidate months of testing, defeating the purpose of the exercise.

If you are interested in integrating Append with this branch, why not create a 20.200 branch
and do so?

Unless you are vetoing the sharing of work as is on a branch (the purpose of the branch),
I suggest we move on.



On Jan 13, 2011, at 8:23 PM, Arun C Murthy wrote:

> On Jan 13, 2011, at 6:50 PM, Eli Collins wrote:
>> The cdh3 patch set Todd is talking about is not vanilla 104.3, it's
>> 104.3 re-based onto 20.2 plus patches from branch-20 and trunk (the
>> performance and stability fixes I think you're referring to, at least
>> the ones that have been posted to Apache jira).
>> Can you post a pointer to the version you're referring to, eg on
>> github?  If there isn't a big delta between it and the cdh3 patch set
>> (which should have the 20-based patches from jira) perhaps you and
>> Todd could easily merge in the delta to create 0.20.x?
> I can guarantee it will need work to merge the enhancements since  
> 20.104.3, it's over 6 months of development. The enhancements includes  
> work on stability such as iterative ls, limits on JT to prevent single  
> jobs/users from taking it down etc. and lots of bug-fixes to security.  
> So, unfortunately the delta is pretty large.
> I'm working on a CHANGES.txt which should reflect all the changes i.e.  
> bug-fixes and enhancements.
>>> The version I'm offering to push to the community has fixed all of  
>>> them,
>>> *plus* the added benefit of several stability and performance fixes  
>>> we have
>>> done since 20.104.3, almost 10 internal releases. This is a battle  
>>> tested
>>> and hardened version which we have deployed on 40,000+ nodes. It is a
>>> significant upgrade on which we never deployed. I'm  
>>> pretty sure
>>> *some* users will find that valuable. ;)
>> Definitely, but better to hit two birds with one stone right?  Instead
>> of a security + enhancements release and an append release we could
>> have a single security + append + enhancements release and users don't
>> have to choose.
> We are discussing two options:
> 20 + security + enhancements
> 20 + security + append
> I think the value we provide via 20+security+enhancements release is  
> that it's stable, tested and deployed at scale. Doing any more work  
> merging 6 months of work at Yahoo (again, I guarantee it's a lot of  
> work) will need a lots of cycles to validate, test and stabilize.
> I feel the alternative is a distraction for me, I'd rather work on 0.22.
> I can get 20+security+enhancements done very, very, quickly precisely  
> because I don't have to spend cycles testing it.
> Does that make sense? Thanks for being patient and bearing with me...
> Arun

View raw message