hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@hortonworks.com>
Subject Re: commit access to hadoop
Date Wed, 28 Nov 2012 10:12:58 GMT
On 26 November 2012 21:25, Radim Kolar <hsn@filez.com> wrote:

>  The main "feature" is that when you get the +1  vote you yourself get to
>> deal with the grunge work of apply
>> patches to one or more svn branches, resyncing that with the git branches
>> you inevitably do your own work on.
> no, main feature is major speed advantage. It takes forever to get
> something committed. I was annoyed with apache nutch last year and forked
> it, here is snapshot from forked codebase http://forum.lupa.cz/index.**
> php?action=dlattach;topic=**1674.0;attach=3439<http://forum.lupa.cz/index.php?action=dlattach;topic=1674.0;attach=3439>now
its 160k LOC on top of apache nutch 1.4. If i worked with these guys,
> it would be never done because it took them 4 months to get 200 lines patch
> reviewed.
I'm sorry you missed the bit in my slides where I emphasised that
review-then-commit is the same rule even if you are a committer. It's not
like you can suddenly put changes in without having gone through the JIRA
circuit. I also tried to explain why the project is so rigorous:

the value of Hadoop is the data stored in HDFS.

Imagine someone could put some minor bit of tuning in there that speeded up
their cluster slightly, but increased the risk of data loss. Or something
to the MR layer that introduced enough of a performance overhead that
someone like facebook would have to buy an extra rack of machines.  That's
why there's a review process. Try getting a patch into ext4 or the linux
kernel scheduler and see if its any easier.

> Hadoop has huge backlog of patches, you need way more committers then you
> have today. I simply could not assign person to working on hadoop fulltime
> because if he submits mere 5 patches per day, you will be never able to
> process them.
The bottleneck is not #of committers, it is #of people who understand
hadoop well enough to be able to provide adequate reviews -and who have the
time to review patches thoroughly -especially the big ones. I think that is
a real problem.

> Your current development process fail to scale. What are your plans for
> moving development faster?

I don't disagree -again, in my slides I tried to make some proposals.

   1. even if the source stays in SVN, we could use git-style work of pull
   requests and gerrit/github code reviewing
   2. better distributed development events, where a group of people can go
   online via a google+ hangout and work together on a specific problem in
   3. more rigorous "review sundays" or similar -where we go through the
   review queue on a free weekend day and see what can be done about them.
   4. Some kind of mentorship process to work with people on larger
   projects. Again, time is the constraint here.

If you've got some other ideas, it'd be good to know them.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message