pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thejas Nair <te...@yahoo-inc.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Fri, 02 Apr 2010 23:08:23 GMT
I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as a
sub-project of hadoop.


On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:

> Over time, Pig is increasing its coupling to Hadoop (for good reasons),
> rather than decreasing it. If and when Pig becomes a viable entity without
> hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
> will only introduce unnecessary administrative and bureaucratic headaches.
> So my vote is also -1.
> -Dmitriy
> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com> wrote:
>> So far I haven't seen any feedback on this.  Apache has asked the Hadoop
>> PMC to submit input in April on whether some subprojects should be promoted
>> to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
>> how we feel about this.  Please make your voice heard.
>> So now I'll head my own call and give my thoughts on it.
>> The biggest advantage I see to being a TLP is a direct connection to
>> Apache.  Right now all of the Pig team's interaction with Apache is through
>> the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
>> members who would have a better view into Apache.  It would also raise our
>> profile in Apache and thus make other projects more aware of us.
>> However, I am concerned about loosing Pig's explicit connection to Hadoop.
>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
>> current flavor of the month in computing.  Given that Pig shares a name with
>> the common farm animal, it's hard to be sure based on search statistics.
>>  But Google trends shows that "hadoop" is searched on much more frequently
>> than "hadoop pig" or "apache pig" (see
>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
>> most Pig users come from Hadoop users who discover Pig via Hadoop's website.
>>  Loosing that subproject tab on Hadoop's front page may radically lower the
>> number of users coming to Pig to check out our project.  I would argue that
>> this benefits Hadoop as well, since high level languages like Pig Latin have
>> the potential to greatly extend the user base and usability of Hadoop.
>> Two, being explicitly connected to Hadoop keeps our two communities aware
>> of each others needs.  There are features proposed for MR that would greatly
>> help Pig.  By staying in the Hadoop community Pig is better positioned to
>> advocate for and help implement and test those features.  The response to
>> this will be that Pig developers can still subscribe to Hadoop mailing
>> lists, submit patches, etc.  That is, they can still be part of the Hadoop
>> community.  Which reinforces my point that it makes more sense to leave Pig
>> in the Hadoop community since Pig developers will need to be part of that
>> community anyway.
>> Finally, philosophically it makes sense to me that projects that are
>> tightly connected belong together.  It strikes me as strange to have Pig as
>> a TLP completely dependent on another TLP.  Hadoop was originally a
>> subproject of Lucene.  It moved out to be a TLP when it became obvious that
>> Hadoop had become independent of and useful apart from Lucene.  Pig is not
>> in that position relative to Hadoop.
>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
>> persuaded that I'm wrong or my concerns can be addressed while still having
>> Pig as a TLP.
>> Alan.
>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>  You have probably heard by now that there is a discussion going on in the
>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
>>> become top level Apache projects (TLP).  This discussion has picked up
>>> recently since the Apache board has clearly communicated to the Hadoop PMC
>>> that it is concerned that Hadoop is acting as an umbrella project with many
>>> disjoint subprojects underneath it.  They are concerned that this gives
>>> Apache little insight into the health and happenings of the subproject
>>> communities which in turn means Apache cannot properly mentor those
>>> communities.
>>> The purpose of this email is to start a discussion within the Pig
>>> community about this topic.  Let me cover first what becoming TLP would mean
>>> for Pig, and then I'll go into what options I think we as a community have.
>>> Becoming a TLP would mean that Pig would itself have a PMC that would
>>> report directly to the Apache board.  Who would be on the PMC would be
>>> something we as a community would need to decide.  Common options would be
>>> to say all active committers are on the PMC, or all active committers who
>>> have been a committer for at least a year.  We would also need to elect a
>>> chair of the PMC.  This lucky person would have no additional power, but
>>> would have the additional responsibility of writing quarterly reports on
>>> Pig's status for Apache board meetings, as well as coordinating with Apache
>>> to get accounts for new  committers, etc.  For more information see
>>> http://www.apache.org/foundation/how-it-works.html#roles
>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>>> community.  We would continue to be invited to Hadoop Summits, HUGs, etc.
>>>  Since all Pig developers and users are by definition Hadoop users, we would
>>> continue to be a strong presence in the Hadoop community.
>>> I see three ways that we as a community can respond to this:
>>> 1) Say yes, we want to be a TLP now.
>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more time
>>> to mature.  If we choose this option we need to be able to clearly
>>> articulate how much time we need and what we hope to see change in that
>>> time.
>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh the
>>> drawbacks of being a disjoint subproject.  If we choose this, we need to be
>>> able to say exactly what those benefits are and why we feel they will be
>>> compromised by leaving the Hadoop project.
>>> There may other options that I haven't thought of.  Please feel free to
>>> suggest any you think of.
>>> Questions?  Thoughts?  Let the discussion begin.
>>> Alan.

View raw message