hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Hadoop 0.19.1
Date Fri, 06 Feb 2009 18:35:26 GMT
Sanjay Radia wrote:
> For me the lesson is that large complex projects should be branched.

We already maintain release branches.  What's under discussion is the 
maintenance of feature branches.  We do this today through patch files, 
merging each time they are applied.  The proposal is to use a source 
code management tool to manage feature branches, which would be merged 
less often, but using better merge tools.

FWIW, Nutch's transition to mapreduce (in the nascent days of Hadoop) 
was managed as a feature branch.  Mike & I worked in a branch for about 
six months before merging back to the trunk.  For a change of that 
magnitude, we found this to be much simpler than updating a patch.

So we need concrete proposals of features that deserve a branches.  In 
retrospect, it seems like append may have been better handled in a 
branch than as a patch, but that's hindsight.  What future features do 
we feel demand this?

One possibility is to manage the project split in a branch.  We could 
start new repos for the hdfs and mapred sub-projects, but branch the 
core repo.  Then changes could continue to be applied to trunk while the 
details of the split are worked out.  Core changes would be merged to 
the new subprojects and the branch while this is in progress. Once we 
feel the split is solid, we can merge the core branch to trunk and open 
the new subprojects for business.

If we change the RPC system, or switch DFS client to use RPC instead of 
sockets, etc., we might want to do these in a branch since they'll touch 
a lot of code and will require extensive testing before we release them.

I don't think this is fundamentally a policy issue.  We still want to 
demand that things are well tested before we commit them to trunk.  The 
append code was committed to trunk prematurely, perhaps since managing 
as a patch was awkward.  So this is a praxis issue.  For features that 
take a long-time to develop, that we do not want to be forced to 
prematurely commit, a branch is perhaps a better mechanism than a patch.

So I think, if a committer feels a feature requires a branch, then they 
can propose that, and if no one objects, they can create it and maintain 
it.  The final commit to trunk is what we should watch most closely, 
since that's the event that corresponds to a commit today.  Commits to a 
feature branch should not require reviews, since these are equivalent to 
updating a patch.

Does this sound reasonable?


View raw message