hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rajiv Chittajallu <raj...@yahoo-inc.com>
Subject Re: Large feature development
Date Sat, 01 Sep 2012 21:29:56 GMT
Its unfortunate that certain work, an year after accepted in to the main line, being attributed
to a single person. There is significant amount of work done by people who are not in the
PMC or a commiter, especially to get it running in production. For those who have been associated
with running hadoop before its became synonymous with 'BigData', stabilizing major release
takes time. With more critical systems dependent on hadoop, transitioning to new feature set
would take longer. hadoop-0.20 took ~8 months.

IMHO, months after a feature set is accepted in to the mainline, it may not be appropriate
to question its quality.

In next couple of months, we are planning to widely deploy 0.23.3 release by Bobby. As with
any major release, I know this is not going to be a smooth ride. 


----- Original Message -----
> From: Todd Lipcon <todd@cloudera.com>
> To: general@hadoop.apache.org
> Cc: 
> Sent: Saturday, September 1, 2012 1:20 AM
> Subject: Re: Large feature development
>T hanks for starting this thread, Steve. I think your points below are
> good. I've snipped most of your comment and will reply inline to one
> bit below:
> On Fri, Aug 31, 2012 at 10:07 AM, Steve Loughran
> <steve.loughran@gmail.com> wrote:
>>  Of the big changes that have worked, they are
>>     1. HDFS 2's HA and ongoing improvements: collaborative dev on the 
> list
>>     with incremental changes going on in trunk, RTC with lots of tests. This
>>     isn't finished, and the test problem there is that functional 
> testing of
>>     all failure modes requires software-controlled fencing devices and 
> switches
>>     -and tests to generated the expected failure space.
> Actually, most of the HDFS HA code has been done on branches. The
> first work that led towards HA was the redesign of the edits logging
> infrastrucutre -- HDFS-1073. This was a feature branch with about 60
> patches on it. Then HDFS-1623, the main manual-failover HA
> development, had close to 150 patches on the branch. Automatic HA
> (HDFS-3042) was some 15-20 patches. The current work (removing
> dependency on NAS) is around 35 patches in so far and getting close to
> merge.
> In these various branches, we've experimented with a few policies
> which have differed from trunk. In particular:
> - HDFS-1073 had a "modified review then commit" policy, which was
> that, if a patch sat without a review for more than 24hrs, we
> committed it with the restriction that there would be a post-commit
> review before the branch was merged.
> - All of the branches have done away with the requirement of running
> the full QA suite, findbugs, etc prior to commit. This means that the
> branches at times have broken tests checked in, but also makes it
> quicker to iterate on the new feature. Again, the assumption is that
> these requirements are met before merge.
> - In all cases there has been a design doc and some good design
> discussion up front before substantial code was written. This made it
> easier to forge ahead on the branch with good confidence that the
> community was on-board with the idea.
> Given my experiences, I think all of the above are useful to follow.
> It means development can happen quickly, but ensures that when the
> merge is proposed, people feel like the quality meets our normal
> standards.
>>     2. YARN: Arun on his own branch, CTR, merge once mostly stable, and
>>     completely replacing MRv1.
> I'd actually contend that YARN was merged too early. I have yet to see
> anyone running YARN in production, and it's holding up the 
> "Stable"
> moniker for Hadoop 2.0 -- HDFS-wise we are already quite stable and
> I'm seeing fewer issues in our customers running Hadoop HDFS 2
> compared to Hadoop 1-derived code.
>>  How then do we get (a) more dev projects working and integrated by the
>>  current committers, and (b) a process in which people who are not yet
>>  contributors/committers can develop non-trivial changes to the project in a
>>  way that it is done with the knowledge, support and mentorship of the rest
>>  of the community?
> Here's one proposal, making use of git as an easy way to allow
> non-committers to "commit" code while still tracking development in
> the usual places:
> - Upon anyone's request, we create a new "Version" tag in JIRA.
> - The developers create an umbrella JIRA for the project, and file the
> individual work items as subtasks (either up front, or as they are
> developed if using a more iterative model)
> - On the umbrella, they add a pointer to a git branch to be used as
> the staging area for the branch. As they develop each subtask, they
> can use the JIRA to discuss the development like they would with a
> normally committed JIRA, but when they feel it is ready to go (not
> requiring a +1 from any committer) they commit to their git branch
> instead of the SVN repo.
> - When the branch is ready to merge, they can call a merge vote, which
> requires +1 from 3 committers, same as a branch being proposed by an
> existing committer. A committer would then use git-svn to merge their
> branch commit-by-commit, or if it is less extensive, simply generate a
> single big patch to commit into SVN.
> My thinking is that this would provide a low-friction way for people
> to collaborate with the community and develop in the open, without
> having to work closely with any committer to review every individual
> subtask.
> Another alternative, if people are reluctant to use git, would be to
> add a "sandbox/" repository inside our SVN, and hand out commit bit to
> branches inside there without any PMC vote. Anyone interested in
> contributing could request a branch in the sandbox, and be granted
> access as soon as they get an apache SVN account.
> -Todd
> -- 
> Todd Lipcon
> Software Engineer, Cloudera

View raw message