hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: [DISCUSSION] Proposal for making core Hadoop changes
Date Wed, 26 May 2010 10:24:15 GMT
Eli Collins wrote:

> The cost of adding features has gotten high anyway (even without
> branching). It's a classic trade-off -- merge overhead vs moving
> faster without burdening others -- as the overhead imposed on others
> increases, and tools (git) make it easier to live and collaborate on
> branches it makes more sense 

maybe, but if you are trying to keep >1 branch in sync, all the low cost 
refactorings become expensive to perform
  -renaming variables
  -hitting the reformat-code button to align the code with the project 
layout rules
  -moving methods around
Life is simplest if you own the entire codebase and can move stuff 
around without any discussion. Closed source projects can do that, but 
even then it annoys other team members. In any OSS project, keeping 
stuff more stable makes it easier to take in third party patches, and 
ensures that stack traces from various versions all point to roughly the 
same code, always handy. Once you try to keep multiple branches alive, 
it becomes very hard to do big changes in trunk.

>(you don't need a team of engineers or
> dedicated merge engineer to maintain the branch).

No, but I'd estimate the cost of merging at 1-2 days work a week just to 
pull in the code *and identify why the tests are failing*. Git may be 
better at merging in changes, but if Hadoop doesn't work on my machine 
after the merge, I need to identify whether its my code, the merged 
code, some machine quirk, etc. It's the testing that is the problem for 
me, not the
merge effort. That's the Hadoop own tests any my own functional test 
suites, the ones that bring up clusters and push work through. Those are 
the troublespots, as they do things that hadoop's own tests don't do, 
like as for all the JSP pages.

> Might find the
> following interesting:
> http://incubator.apache.org/learn/rules-for-revolutionaries.html

There's a long story behind JDD's paper, I'm glad you have read it, it 
does lay out what is effectively the ASF process for effecting 
significant change -but it doesn't imply that's the only process for 
having changes.

One of the big issues that in any successful project it becomes hard to 
do a big rewrite, and you end up with what was done early on, despite 
known issues. The "Some Thoughts on Ant 1.3 and 2.0" discussion is 
related to this  we -and I wasn't a committer at this time, just a user- 
weren't able to do the big rework so we are left today with the design 
errors of the past (like the way undefined properties just get retained 
as ${undefined.property} instead of some kind of error appearing):

I think gradual evolution in trunk is good, it lets people play with 
what's coming in. Having lots of separate branches and everyone's 
private release being a merge of many patches that you choose is bad. 
Because it means my version != your version != anyone else's, which 
implies that your tests mean nothing to me unless I also test at scale. 
Which I can do, but with different hardware and network configs from 
other people, it's still tricky to assign blame. Is it my merge that 
isn't working, is it some quirk of virtualisation underneath, or is it 
just this week's trunk playing up?

View raw message