hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Loughran <ste...@apache.org>
Subject Re: Discussions - Re: [VOTE] Release candidate
Date Tue, 03 May 2011 14:17:07 GMT
On 03/05/11 01:41, Roy T. Fielding wrote:
  I am constantly amazed at how
> quiet it is in this project, at least until I remember that
> most of the work is done exclusively via jira, unlike any of
> my other followed projects that use jira.  I'd suggest that
> the right place to hold any discussion is on the dev list,
> but I am not on that list because it receives way too many
> automated notifications.  Maybe it would help discussion on
> dev if notices were sent elsewhere and only discussions were
> held on dev.

I've seen this before on the Maven lists, where there's mostly a stream 
of JIRA changes above anything else:

however, they've got no JIRA issues in their list now, which may imply 
all changes aren't going to the list, or they arent using it so much:

(pause: bisecting their list shows that in 1.mar.06 they forked JIRA to 
a separate list to hide the details of ongoing work)

In some ways it's a means of dealing with a large and fast moving 
codebase: you subscribe to the issues that matter to you, all the 
discussions on a specific feature are archived, etc.

However, it has some flaws
  -discouragement of community, you become a group of people working on 
JIRA issues, rather than on a large integrated project
  -with work spread across common, hdfs and mapreduce JIRAs and mailing 
lists, it's hard to keep all the things in your head -it is pretty much 
a full time job to do so. And I don't know about the others, but I don't 
have the time.
  -we need a way of gently moving people from those who use hadoop to 
those who develop it. To me, every end user is a warm engineering 
resource we just need to point at a problem that they care about. The 
scale of the project, its complexity, JIRA change rate and testing 
difficulties are all barriers to entry -you end up needing a team of people
  * someone to track all the issues and keep the design in their head
  * 1+ person to test
  * 1+ person to code
I don't know about others, but I can't do this on my own.

The attempt to split up into HDFS+MAPREDUCE was one tactic to deal with 
this, but it hasn't worked, we just have more mailing lists to track (or 
in my case, fall behind on).


-I'm favour of shipping an apache release of 20.x that has the patches 
that Y! and others have added to deal with scale and availability -and 
which has been tested by them. This will provide an apache release for 
people to use in production systems -because the official apache 
releases have lagged the CDH and Y! releases.

-I'd like to see all the changes integrated into trunk too, as it 
doesn't make sense for a patch in this branch not to be in trunk.


View raw message