Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id CF7BE2C7D for ; Thu, 5 May 2011 09:58:50 +0000 (UTC) Received: (qmail 72160 invoked by uid 500); 5 May 2011 09:58:49 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 72090 invoked by uid 500); 5 May 2011 09:58:48 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Delivered-To: moderator for general@hadoop.apache.org Received: (qmail 65696 invoked by uid 99); 5 May 2011 09:52:02 -0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tvalderrama@tuenti.com designates 209.85.215.48 as permitted sender) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tuenti.com; s=corp; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=Pse/nqpWTyVi8eXWYRU98wr9i0G+boceclJR76kAg78=; b=jzU6LWQ27gYW+GWrFcnpRzBb/Os0iLHwVWwad4VWfpwrV1kI5W/fkrwHc4LbcVDEqF oOrVpoZsyfEdhs3eWGTY/HgcIstDYb33bg6HZyipoQ0TIfgCStIDMAjcCd2h+/aSA4j5 0cGPyZhVmmffp/GrABBuEm3C6GS0qT2E0BT0Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=tuenti.com; s=corp; h=mime-version:date:message-id:subject:from:to:content-type; b=YVarVi4pvvzahqovcQGQQLutMVkt4ySlaOijhMHTtmqhiepZvpmp+PuA6Afny2FmTz P272LzfrHBrYciqx4wNIG5L3Vi1wdwgOFOa2DJ4aOt94qa1AIqPhNclPRSokbzWfznn9 bST5X0AedypJoMIKWh4ZqN8Hjd59Qic3rGBJg= MIME-Version: 1.0 Date: Thu, 5 May 2011 11:51:35 +0200 Message-ID: Subject: Re: [DISCUSSION] development process of Hadoop From: Tony Valderrama To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Hi, I just wanted to drop in a few thoughts from a new developer working outside of the Hadoop developer community. On Wed, May 4, 2011 at 7:39 PM, Eric Yang wrote: > While the world demand agility, the "review then commit" process is preventing progress > from happening. People end up having to generate multiple version of patches to ensure > the code can be applied. The large lag time between patch generation and reviewed > is taking significant toll on the community and progress. > Yahoo have a great team of developers who improves Hadoop at faster pace with its own > fork of the source code. The reason that Yahoo was able to achieve faster improvement with > features was due to the ability to use source code repository tools properly. Unfortunate > for Yahoo, their source code repository was not Apache svn trunk. I agree that the review process is broken. However, the current situation is exactly the result of a lack of adherence to this and other processes. Various subgroups within the community have (intentionally or unintentionally) hijacked the project at different times by avoiding community processes in the interest of agility or commercial benefit, and the result is a highly fragmented project with no clear direction. >From the outside, Hadoop looks like a Yahoo/Cloudera project which occasionally gets an Apache stamp. Given the lack of adherence to processes, as a non-Yahoo/Cloudera developer I have no way of breaking into the development community. Who's going to review or commit patches I submit? And which of the myriad versions should I even be trying to patch against? And given the speed with which undocumented changes are being made, how am I supposed to figure out if my changes are going to be relevant or viable next week? We'd love to contribute back, but it's just not clear that we or other small players have any place within the Hadoop developer community. Here at Tuenti, like various other small-to-midsize Hadoop users, we've just forked 0.20 and devoted a couple of developers to maintaining features that we need. It would be nice to have shiny new features in the Yahoo branch or the Facebook branch or the Cloudera branch or the 0.22 branch (does Hadoop even have a trunk at the moment?), but we'll favor our own stable and familiar branch over the risky and hefty investment required to adopt a branch without clear community support. > Use JIRA, if there is large feature set that requires brain storming, and developers > should have the ability to make small incremental changes without RTC. This will ensure developers > help each other rather than policing each other. As an outsider, JIRA is the only way I've been able to follow the changes to Hadoop's code and guess where the project is heading. Permitting developers to commit without review or documentation will just further exclude anyone who can't walk down the hall and knock on an office door to ask about a commit. Of course, take this with a grain of salt, since I don't claim to be a part of the Hadoop developer community and I don't forsee Tuenti ever playing a major role in the developer community. ~Tony