pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Gates <ga...@yahoo-inc.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Wed, 31 Mar 2010 21:38:13 GMT
So far I haven't seen any feedback on this.  Apache has asked the  
Hadoop PMC to submit input in April on whether some subprojects should  
be promoted to TLPs.  We, the Pig community, need to give feedback to  
the Hadoop PMC on how we feel about this.  Please make your voice heard.

So now I'll head my own call and give my thoughts on it.

The biggest advantage I see to being a TLP is a direct connection to  
Apache.  Right now all of the Pig team's interaction with Apache is  
through the Hadoop PMC.  Being directly connected to Apache would  
benefit Pig team members who would have a better view into Apache.  It  
would also raise our profile in Apache and thus make other projects  
more aware of us.

However, I am concerned about loosing Pig's explicit connection to  
Hadoop.  This concern has a couple of dimensions.  One, Hadoop and  
MapReduce are the current flavor of the month in computing.  Given  
that Pig shares a name with the common farm animal, it's hard to be  
sure based on search statistics.  But Google trends shows that  
"hadoop" is searched on much more frequently than "hadoop pig" or  
"apache pig" (see http://www.google.com/trends?q=hadoop%2Chadoop 
+pig).  I am guessing that most Pig users come from Hadoop users who  
discover Pig via Hadoop's website.  Loosing that subproject tab on  
Hadoop's front page may radically lower the number of users coming to  
Pig to check out our project.  I would argue that this benefits Hadoop  
as well, since high level languages like Pig Latin have the potential  
to greatly extend the user base and usability of Hadoop.

Two, being explicitly connected to Hadoop keeps our two communities  
aware of each others needs.  There are features proposed for MR that  
would greatly help Pig.  By staying in the Hadoop community Pig is  
better positioned to advocate for and help implement and test those  
features.  The response to this will be that Pig developers can still  
subscribe to Hadoop mailing lists, submit patches, etc.  That is, they  
can still be part of the Hadoop community.  Which reinforces my point  
that it makes more sense to leave Pig in the Hadoop community since  
Pig developers will need to be part of that community anyway.

Finally, philosophically it makes sense to me that projects that are  
tightly connected belong together.  It strikes me as strange to have  
Pig as a TLP completely dependent on another TLP.  Hadoop was  
originally a subproject of Lucene.  It moved out to be a TLP when it  
became obvious that Hadoop had become independent of and useful apart  
from Lucene.  Pig is not in that position relative to Hadoop.

So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to  
being persuaded that I'm wrong or my concerns can be addressed while  
still having Pig as a TLP.


On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:

> You have probably heard by now that there is a discussion going on  
> in the Hadoop PMC as to whether a number of the subprojects (Hbase,  
> Avro, Zookeeper, Hive, and Pig) should move out from under the  
> Hadoop umbrella and become top level Apache projects (TLP).  This  
> discussion has picked up recently since the Apache board has clearly  
> communicated to the Hadoop PMC that it is concerned that Hadoop is  
> acting as an umbrella project with many disjoint subprojects  
> underneath it.  They are concerned that this gives Apache little  
> insight into the health and happenings of the subproject communities  
> which in turn means Apache cannot properly mentor those communities.
> The purpose of this email is to start a discussion within the Pig  
> community about this topic.  Let me cover first what becoming TLP  
> would mean for Pig, and then I'll go into what options I think we as  
> a community have.
> Becoming a TLP would mean that Pig would itself have a PMC that  
> would report directly to the Apache board.  Who would be on the PMC  
> would be something we as a community would need to decide.  Common  
> options would be to say all active committers are on the PMC, or all  
> active committers who have been a committer for at least a year.  We  
> would also need to elect a chair of the PMC.  This lucky person  
> would have no additional power, but would have the additional  
> responsibility of writing quarterly reports on Pig's status for  
> Apache board meetings, as well as coordinating with Apache to get  
> accounts for new  committers, etc.  For more information see http://www.apache.org/foundation/how-it-works.html#roles
> Becoming a TLP would not mean that we are ostracized from the Hadoop  
> community.  We would continue to be invited to Hadoop Summits, HUGs,  
> etc.  Since all Pig developers and users are by definition Hadoop  
> users, we would continue to be a strong presence in the Hadoop  
> community.
> I see three ways that we as a community can respond to this:
> 1) Say yes, we want to be a TLP now.
> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more  
> time to mature.  If we choose this option we need to be able to  
> clearly articulate how much time we need and what we hope to see  
> change in that time.
> 3) Say no, we feel the benefits for us staying with Hadoop outweigh  
> the drawbacks of being a disjoint subproject.  If we choose this, we  
> need to be able to say exactly what those benefits are and why we  
> feel they will be compromised by leaving the Hadoop project.
> There may other options that I haven't thought of.  Please feel free  
> to suggest any you think of.
> Questions?  Thoughts?  Let the discussion begin.
> Alan.

View raw message