Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of tvalderrama@tuenti.com
 designates 209.85.215.48 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=tuenti.com; s=corp;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=YVarVi4pvvzahqovcQGQQLutMVkt4ySlaOijhMHTtmqhiepZvpmp+PuA6Afny2FmTz
         P272LzfrHBrYciqx4wNIG5L3Vi1wdwgOFOa2DJ4aOt94qa1AIqPhNclPRSokbzWfznn9
         bST5X0AedypJoMIKWh4ZqN8Hjd59Qic3rGBJg=
MIME-Version: 1.0
Date: Thu, 5 May 2011 11:51:35 +0200
Message-ID: <BANLkTimgKU8p0QLKtVC+xrdSR9Vv5gjx5w@mail.gmail.com>
Subject: Re: [DISCUSSION] development process of Hadoop
From: Tony Valderrama <tvalderrama@tuenti.com>
To: general@hadoop.apache.org
Content-Type: text/plain; charset=ISO-8859-1

Hi, I just wanted to drop in a few thoughts from a new developer
working outside of the Hadoop developer community.

On Wed, May 4, 2011 at 7:39 PM, Eric Yang <eyang@yahoo-inc.com> wrote:
> While the world demand agility, the "review then commit" process is preventing progress
> from happening.  People end up having to generate multiple version of patches to ensure
> the code can be applied.  The large lag time between patch generation and reviewed
> is taking significant toll on the community and progress.

> Yahoo have a great team of developers who improves Hadoop at faster pace with its own
> fork of the source code.  The reason that Yahoo was able to achieve faster improvement with
> features was due to the ability to use source code repository tools properly.  Unfortunate
> for Yahoo, their source code repository was not Apache svn trunk.

I agree that the review process is broken.  However, the current
situation is exactly the result of a lack of adherence to this and
other processes.  Various subgroups within the community have
(intentionally or unintentionally) hijacked the project at different
times by avoiding community processes in the interest of agility or
commercial benefit, and the result is a highly fragmented project with
no clear direction.

>From the outside, Hadoop looks like a Yahoo/Cloudera project which
occasionally gets an Apache stamp.  Given the lack of adherence to
processes, as a non-Yahoo/Cloudera developer I have no way of breaking
into the development community.  Who's going to review or commit
patches I submit?  And which of the myriad versions should I even be
trying to patch against?  And given the speed with which undocumented
changes are being made, how am I supposed to figure out if my changes
are going to be relevant or viable next week?  We'd love to contribute
back, but it's just not clear that we or other small players have any
place within the Hadoop developer community.

Here at Tuenti, like various other small-to-midsize Hadoop users,
we've just forked 0.20 and devoted a couple of developers to
maintaining features that we need.  It would be nice to have shiny new
features in the Yahoo branch or the Facebook branch or the Cloudera
branch or the 0.22 branch (does Hadoop even have a trunk at the
moment?), but we'll favor our own stable and familiar branch over the
risky and hefty investment required to adopt a branch without clear
community support.


> Use JIRA, if there is large feature set that requires brain storming, and developers
> should have the ability to make small incremental changes without RTC.  This will ensure developers
> help each other rather than policing each other.

As an outsider, JIRA is the only way I've been able to follow the
changes to Hadoop's code and guess where the project is heading.
Permitting developers to commit without review or documentation will
just further exclude anyone who can't walk down the hall and knock on
an office door to ask about a commit.


Of course, take this with a grain of salt, since I don't claim to be a
part of the Hadoop developer community and I don't forsee Tuenti ever
playing a major role in the developer community.
~Tony