hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai" <dai...@gmail.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Mon, 05 Apr 2010 22:02:42 GMT
I agree with the stance that we remain in Hadoop until we see more 
compelling reasons, such as Pig go beyond Hadoop happens. Currently I cannot 
fully weight the advantage and disadvantage of becoming a TLP. But provides 
this is a point of no return, I don't want to move unless we do have a 
strong motivation. We can always choose to become TLP later when we feel 
more convinced to that.

Daniel

--------------------------------------------------
From: "Santhosh Srinivasan" <sms@yahoo-inc.com>
Sent: Monday, April 05, 2010 12:22 PM
To: <pig-dev@hadoop.apache.org>
Subject: RE: Begin a discussion about Pig as a top level project

> "Given that, do you think it makes
> sense to say that Pig stays a subproject for now, but if it someday
> grows beyond Hadoop only it becomes a TLP?  I could agree to that
> stance."
>
> Bingo!
>
> Santhosh
>
> -----Original Message-----
> From: Alan Gates [mailto:gates@yahoo-inc.com]
> Sent: Monday, April 05, 2010 11:37 AM
> To: pig-dev@hadoop.apache.org
> Subject: Re: Begin a discussion about Pig as a top level project
>
> Prognostication is a difficult business.  Of course I'd love it if
> someday there is an ISO Pig Latin committee (with meetings in cool
> exotic places) deciding the official standard for Pig Latin.  But that
> seems like saying in your start up's business plan, "When we reach
> Google's size, then we'll do x".  If there ever is an ISO Pig Latin
> standard it will be years off.
>
> As others have noted, staying tight to Hadoop now has many advantages,
> both in technical and adoption terms.  Hence my advocacy of keeping
> Pig Latin Hadoop agnostic while tightly integrating the backend.
> Which is to say that in my view, Pig is Hadoop specific now, but there
> may come a day when that is no longer true.   Whether Pig will ever
> move past just running on Hadoop to running in other parallel systems
> won't be known for years to come.  Given that, do you think it makes
> sense to say that Pig stays a subproject for now, but if it someday
> grows beyond Hadoop only it becomes a TLP?  I could agree to that
> stance.
>
> Alan.
>
> On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
>
>> I see this as a multi-part question. Looking back at some of the
>> significant roadmap/existential questions asked in the last 12
>> months, I
>> see the following:
>>
>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
>> an email about this approximately 9 months ago)
>> 2. What is the approach to support backward compatibility in Pig (Alan
>> had sent an email about this 3 months ago)
>> 3. Should Pig be a TLP (the current email thread).
>>
>> Here is my take on answering the aforementioned questions.
>>
>> The initial philosophy of Pig was to be backend agnostic. It was
>> designed as a data flow language. Whenever a new language is designed,
>> the syntax and semantics of the language have to be laid out. The
>> syntax
>> is usually captured in the form of a BNF grammar. The semantics are
>> defined by the language creators. Backward compatibility is then a
>> question of holding true to the syntax and semantics. With Pig, in
>> addition to the language, the Java APIs were exposed to customers to
>> implement UDFs (load/store/filter/grouping/row transformation etc),
>> provision looping since the language does not support looping
>> constructs
>> and also support a programmatic mode of access. Backward compatibility
>> in this context is to support API versioning.
>>
>> Do we still intend to position as a data flow language that is backend
>> agnostic? If the answer is yes, then there is a strong case for making
>> Pig a TLP.
>>
>> Are we influenced by Hadoop? A big YES! The reason Pig chose to
>> become a
>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>>
>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>
>> 1. Search engine popularity - We can discuss this with the Hadoop team
>> and still retain links to TLP's that are coupled (loosely or tightly).
>> 2. Explicit connection to Hadoop - I see this as logical connection
>> v/s
>> physical connection. Today, we are physically connected as a
>> sub-project. Becoming a TLP, will not increase/decrease our
>> influence on
>> the Hadoop community (think Logical, Physical and MR Layers :)
>> 3. Philosophy - I have already talked about this. The tight coupling
>> is
>> by choice. If Pig continues to be a data flow language with clear
>> syntax
>> and semantics then someone can implement Pig on top of a different
>> backend. Do we intend to take this approach?
>>
>> I just wanted to offer a different opinion to this thread. I strongly
>> believe that we should think about the original philosophy. Will we
>> have
>> a Pig standards committee that will decide on the changes to the
>> language (think C/C++) if there are multiple backend implementations?
>>
>> I will reserve my vote based on the outcome of the philosophy and
>> backward compatibility discussions. If we decide that Pig will be
>> treated and maintained like a true language with clear syntax and
>> semantics then we have a strong case to make it into a TLP. If not, we
>> should retain our existing ties to Hadoop and make Pig into a data
>> flow
>> language for Hadoop.
>>
>> Santhosh
>>
>> -----Original Message-----
>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>> Sent: Friday, April 02, 2010 4:08 PM
>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>> Subject: Re: Begin a discussion about Pig as a top level project
>>
>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,
>> and
>> heavily influenced by its roadmap. I think it makes sense to
>> continue as
>> a sub-project of hadoop.
>>
>> -Thejas
>>
>>
>>
>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:
>>
>>> Over time, Pig is increasing its coupling to Hadoop (for good
>>> reasons), rather than decreasing it. If and when Pig becomes a viable
>>> entity without hadoop around, it might make sense as a TLP. As is, I
>>> think becoming a TLP will only introduce unnecessary administrative
>> and bureaucratic headaches.
>>> So my vote is also -1.
>>>
>>> -Dmitriy
>>>
>>>
>>>
>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com>
>> wrote:
>>>
>>>> So far I haven't seen any feedback on this.  Apache has asked the
>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>> your voice heard.
>>>>
>>>> So now I'll head my own call and give my thoughts on it.
>>>>
>>>> The biggest advantage I see to being a TLP is a direct connection to
>>>> Apache.  Right now all of the Pig team's interaction with Apache is
>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>> benefit Pig team members who would have a better view into Apache.
>>>> It would also raise our profile in Apache and thus make other
>> projects more aware of us.
>>>>
>>>> However, I am concerned about loosing Pig's explicit connection to
>> Hadoop.
>>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>>> are the current flavor of the month in computing.  Given that Pig
>>>> shares a name with the common farm animal, it's hard to be sure
>>>> based
>> on search statistics.
>>>> But Google trends shows that "hadoop" is searched on much more
>>>> frequently than "hadoop pig" or "apache pig" (see
>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>>> that most Pig users come from Hadoop users who discover Pig via
>> Hadoop's website.
>>>> Loosing that subproject tab on Hadoop's front page may radically
>>>> lower the number of users coming to Pig to check out our project.  I
>>>> would argue that this benefits Hadoop as well, since high level
>>>> languages like Pig Latin have the potential to greatly extend the
>> user base and usability of Hadoop.
>>>>
>>>> Two, being explicitly connected to Hadoop keeps our two communities
>>>> aware of each others needs.  There are features proposed for MR that
>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>> better positioned to advocate for and help implement and test those
>>>> features.  The response to this will be that Pig developers can
>>>> still
>>
>>>> subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>> they can still be part of the Hadoop community.  Which reinforces my
>>>> point that it makes more sense to leave Pig in the Hadoop community
>>>> since Pig developers will need to be part of that community anyway.
>>>>
>>>> Finally, philosophically it makes sense to me that projects that are
>>>> tightly connected belong together.  It strikes me as strange to have
>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>> originally a subproject of Lucene.  It moved out to be a TLP when it
>>>> became obvious that Hadoop had become independent of and useful
>>>> apart
>>
>>>> from Lucene.  Pig is not in that position relative to Hadoop.
>>>>
>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>> being persuaded that I'm wrong or my concerns can be addressed while
>>>> still having Pig as a TLP.
>>>>
>>>> Alan.
>>>>
>>>>
>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>
>>>> You have probably heard by now that there is a discussion going on
>>>> in the
>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>> discussion has picked up recently since the Apache board has
>>>>> clearly
>>
>>>>> communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>>> acting as an umbrella project with many disjoint subprojects
>>>>> underneath it.  They are concerned that this gives Apache little
>>>>> insight into the health and happenings of the subproject
>>>>> communities
>>
>>>>> which in turn means Apache cannot properly mentor those
>>>>> communities.
>>>>>
>>>>> The purpose of this email is to start a discussion within the Pig
>>>>> community about this topic.  Let me cover first what becoming TLP
>>>>> would mean for Pig, and then I'll go into what options I think we
>>>>> as
>> a community have.
>>>>>
>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>> would report directly to the Apache board.  Who would be on the PMC
>>>>> would be something we as a community would need to decide.  Common
>>>>> options would be to say all active committers are on the PMC, or
>>>>> all
>>
>>>>> active committers who have been a committer for at least a year.
>>>>> We
>>
>>>>> would also need to elect a chair of the PMC.  This lucky person
>>>>> would have no additional power, but would have the additional
>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>> Apache board meetings, as well as coordinating with Apache to get
>>>>> accounts for new  committers, etc.  For more information see
>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>
>>>>> Becoming a TLP would not mean that we are ostracized from the
>>>>> Hadoop
>>
>>>>> community.  We would continue to be invited to Hadoop Summits,
>>>>> HUGs,
>> etc.
>>>>> Since all Pig developers and users are by definition Hadoop users,
>>>>> we would continue to be a strong presence in the Hadoop community.
>>>>>
>>>>> I see three ways that we as a community can respond to this:
>>>>>
>>>>> 1) Say yes, we want to be a TLP now.
>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
>>>>> time to mature.  If we choose this option we need to be able to
>>>>> clearly articulate how much time we need and what we hope to see
>>>>> change in that time.
>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>>> the drawbacks of being a disjoint subproject.  If we choose this,
>>>>> we
>>
>>>>> need to be able to say exactly what those benefits are and why we
>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>
>>>>> There may other options that I haven't thought of.  Please feel
>>>>> free
>>
>>>>> to suggest any you think of.
>>>>>
>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>
>>>>> Alan.
>>>>>
>>>>>
>>>>
>>
> 

Mime
View raw message