hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Wed, 31 Mar 2010 23:04:45 GMT
Over time, Pig is increasing its coupling to Hadoop (for good reasons),
rather than decreasing it. If and when Pig becomes a viable entity without
hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
will only introduce unnecessary administrative and bureaucratic headaches.
So my vote is also -1.

-Dmitriy



On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com> wrote:

> So far I haven't seen any feedback on this.  Apache has asked the Hadoop
> PMC to submit input in April on whether some subprojects should be promoted
> to TLPs.  We, the Pig community, need to give feedback to the Hadoop PMC on
> how we feel about this.  Please make your voice heard.
>
> So now I'll head my own call and give my thoughts on it.
>
> The biggest advantage I see to being a TLP is a direct connection to
> Apache.  Right now all of the Pig team's interaction with Apache is through
> the Hadoop PMC.  Being directly connected to Apache would benefit Pig team
> members who would have a better view into Apache.  It would also raise our
> profile in Apache and thus make other projects more aware of us.
>
> However, I am concerned about loosing Pig's explicit connection to Hadoop.
>  This concern has a couple of dimensions.  One, Hadoop and MapReduce are the
> current flavor of the month in computing.  Given that Pig shares a name with
> the common farm animal, it's hard to be sure based on search statistics.
>  But Google trends shows that "hadoop" is searched on much more frequently
> than "hadoop pig" or "apache pig" (see
> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing that
> most Pig users come from Hadoop users who discover Pig via Hadoop's website.
>  Loosing that subproject tab on Hadoop's front page may radically lower the
> number of users coming to Pig to check out our project.  I would argue that
> this benefits Hadoop as well, since high level languages like Pig Latin have
> the potential to greatly extend the user base and usability of Hadoop.
>
> Two, being explicitly connected to Hadoop keeps our two communities aware
> of each others needs.  There are features proposed for MR that would greatly
> help Pig.  By staying in the Hadoop community Pig is better positioned to
> advocate for and help implement and test those features.  The response to
> this will be that Pig developers can still subscribe to Hadoop mailing
> lists, submit patches, etc.  That is, they can still be part of the Hadoop
> community.  Which reinforces my point that it makes more sense to leave Pig
> in the Hadoop community since Pig developers will need to be part of that
> community anyway.
>
> Finally, philosophically it makes sense to me that projects that are
> tightly connected belong together.  It strikes me as strange to have Pig as
> a TLP completely dependent on another TLP.  Hadoop was originally a
> subproject of Lucene.  It moved out to be a TLP when it became obvious that
> Hadoop had become independent of and useful apart from Lucene.  Pig is not
> in that position relative to Hadoop.
>
> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to being
> persuaded that I'm wrong or my concerns can be addressed while still having
> Pig as a TLP.
>
> Alan.
>
>
> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>
>  You have probably heard by now that there is a discussion going on in the
>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>> Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and
>> become top level Apache projects (TLP).  This discussion has picked up
>> recently since the Apache board has clearly communicated to the Hadoop PMC
>> that it is concerned that Hadoop is acting as an umbrella project with many
>> disjoint subprojects underneath it.  They are concerned that this gives
>> Apache little insight into the health and happenings of the subproject
>> communities which in turn means Apache cannot properly mentor those
>> communities.
>>
>> The purpose of this email is to start a discussion within the Pig
>> community about this topic.  Let me cover first what becoming TLP would mean
>> for Pig, and then I'll go into what options I think we as a community have.
>>
>> Becoming a TLP would mean that Pig would itself have a PMC that would
>> report directly to the Apache board.  Who would be on the PMC would be
>> something we as a community would need to decide.  Common options would be
>> to say all active committers are on the PMC, or all active committers who
>> have been a committer for at least a year.  We would also need to elect a
>> chair of the PMC.  This lucky person would have no additional power, but
>> would have the additional responsibility of writing quarterly reports on
>> Pig's status for Apache board meetings, as well as coordinating with Apache
>> to get accounts for new  committers, etc.  For more information see
>> http://www.apache.org/foundation/how-it-works.html#roles
>>
>> Becoming a TLP would not mean that we are ostracized from the Hadoop
>> community.  We would continue to be invited to Hadoop Summits, HUGs, etc.
>>  Since all Pig developers and users are by definition Hadoop users, we would
>> continue to be a strong presence in the Hadoop community.
>>
>> I see three ways that we as a community can respond to this:
>>
>> 1) Say yes, we want to be a TLP now.
>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more time
>> to mature.  If we choose this option we need to be able to clearly
>> articulate how much time we need and what we hope to see change in that
>> time.
>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh the
>> drawbacks of being a disjoint subproject.  If we choose this, we need to be
>> able to say exactly what those benefits are and why we feel they will be
>> compromised by leaving the Hadoop project.
>>
>> There may other options that I haven't thought of.  Please feel free to
>> suggest any you think of.
>>
>> Questions?  Thoughts?  Let the discussion begin.
>>
>> Alan.
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message