pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dmitriy Ryaboy <dvrya...@gmail.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Mon, 05 Apr 2010 19:10:10 GMT
The Twitter office is cushier and has more bars within stumbling distance.
Just sayin'.

To the subject at hand -- I don't think TLP standing has the PR value you
think it does... feature set, velocity of development, adoption,
flexibility, etc -- those are far more important.

-Dmitriy

On Mon, Apr 5, 2010 at 11:58 AM, hc busy <hc.busy@gmail.com> wrote:

> > Of course I'd love it if someday there is an ISO Pig Latin committee
> (with
> meetings in cool exotic places) deciding the official standard for Pig
> Latin.
>
> haha!!! Some exotic place like Yahoo's  HQ in sunny Sunnyvale California?
>
> I guess it feels like it depends on the roadmap more than roadmap depends
> on
> it. In terms of positioning, a TLP would appear to potential users who are
> evaluating alternatives to consider it as _the_ choice as opposed to one of
> the choices. If the ambition is to take it there, then TLP, as useless as
> it
> may seem right now, might actually be worth the effort to attain.
>
> I mean, would you rather wait until Hive makes TLP and then play catch up?
> I
> mean, I can kinda see them doing that...
>
>
>
>
> On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates <gates@yahoo-inc.com> wrote:
>
> > Prognostication is a difficult business.  Of course I'd love it if
> someday
> > there is an ISO Pig Latin committee (with meetings in cool exotic places)
> > deciding the official standard for Pig Latin.  But that seems like saying
> in
> > your start up's business plan, "When we reach Google's size, then we'll
> do
> > x".  If there ever is an ISO Pig Latin standard it will be years off.
> >
> > As others have noted, staying tight to Hadoop now has many advantages,
> both
> > in technical and adoption terms.  Hence my advocacy of keeping Pig Latin
> > Hadoop agnostic while tightly integrating the backend.  Which is to say
> that
> > in my view, Pig is Hadoop specific now, but there may come a day when
> that
> > is no longer true.   Whether Pig will ever move past just running on
> Hadoop
> > to running in other parallel systems won't be known for years to come.
> >  Given that, do you think it makes sense to say that Pig stays a
> subproject
> > for now, but if it someday grows beyond Hadoop only it becomes a TLP?  I
> > could agree to that stance.
> >
> > Alan.
> >
> >
> > On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
> >
> >  I see this as a multi-part question. Looking back at some of the
> >> significant roadmap/existential questions asked in the last 12 months, I
> >> see the following:
> >>
> >> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> >> an email about this approximately 9 months ago)
> >> 2. What is the approach to support backward compatibility in Pig (Alan
> >> had sent an email about this 3 months ago)
> >> 3. Should Pig be a TLP (the current email thread).
> >>
> >> Here is my take on answering the aforementioned questions.
> >>
> >> The initial philosophy of Pig was to be backend agnostic. It was
> >> designed as a data flow language. Whenever a new language is designed,
> >> the syntax and semantics of the language have to be laid out. The syntax
> >> is usually captured in the form of a BNF grammar. The semantics are
> >> defined by the language creators. Backward compatibility is then a
> >> question of holding true to the syntax and semantics. With Pig, in
> >> addition to the language, the Java APIs were exposed to customers to
> >> implement UDFs (load/store/filter/grouping/row transformation etc),
> >> provision looping since the language does not support looping constructs
> >> and also support a programmatic mode of access. Backward compatibility
> >> in this context is to support API versioning.
> >>
> >> Do we still intend to position as a data flow language that is backend
> >> agnostic? If the answer is yes, then there is a strong case for making
> >> Pig a TLP.
> >>
> >> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
> >> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> >> consequence, we chose to be heavily influenced by the Hadoop roadmap.
> >>
> >> Like a good lawyer, I also have rebuttals to Alan's questions :)
> >>
> >> 1. Search engine popularity - We can discuss this with the Hadoop team
> >> and still retain links to TLP's that are coupled (loosely or tightly).
> >> 2. Explicit connection to Hadoop - I see this as logical connection v/s
> >> physical connection. Today, we are physically connected as a
> >> sub-project. Becoming a TLP, will not increase/decrease our influence on
> >> the Hadoop community (think Logical, Physical and MR Layers :)
> >> 3. Philosophy - I have already talked about this. The tight coupling is
> >> by choice. If Pig continues to be a data flow language with clear syntax
> >> and semantics then someone can implement Pig on top of a different
> >> backend. Do we intend to take this approach?
> >>
> >> I just wanted to offer a different opinion to this thread. I strongly
> >> believe that we should think about the original philosophy. Will we have
> >> a Pig standards committee that will decide on the changes to the
> >> language (think C/C++) if there are multiple backend implementations?
> >>
> >> I will reserve my vote based on the outcome of the philosophy and
> >> backward compatibility discussions. If we decide that Pig will be
> >> treated and maintained like a true language with clear syntax and
> >> semantics then we have a strong case to make it into a TLP. If not, we
> >> should retain our existing ties to Hadoop and make Pig into a data flow
> >> language for Hadoop.
> >>
> >> Santhosh
> >>
> >> -----Original Message-----
> >> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> >> Sent: Friday, April 02, 2010 4:08 PM
> >> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> >> Subject: Re: Begin a discussion about Pig as a top level project
> >>
> >> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
> >> heavily influenced by its roadmap. I think it makes sense to continue as
> >> a sub-project of hadoop.
> >>
> >> -Thejas
> >>
> >>
> >>
> >> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:
> >>
> >>  Over time, Pig is increasing its coupling to Hadoop (for good
> >>> reasons), rather than decreasing it. If and when Pig becomes a viable
> >>> entity without hadoop around, it might make sense as a TLP. As is, I
> >>> think becoming a TLP will only introduce unnecessary administrative
> >>>
> >> and bureaucratic headaches.
> >>
> >>> So my vote is also -1.
> >>>
> >>> -Dmitriy
> >>>
> >>>
> >>>
> >>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com>
> >>>
> >> wrote:
> >>
> >>>
> >>>  So far I haven't seen any feedback on this.  Apache has asked the
> >>>> Hadoop PMC to submit input in April on whether some subprojects
> >>>> should be promoted to TLPs.  We, the Pig community, need to give
> >>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> >>>>
> >>> your voice heard.
> >>
> >>>
> >>>> So now I'll head my own call and give my thoughts on it.
> >>>>
> >>>> The biggest advantage I see to being a TLP is a direct connection to
> >>>> Apache.  Right now all of the Pig team's interaction with Apache is
> >>>> through the Hadoop PMC.  Being directly connected to Apache would
> >>>> benefit Pig team members who would have a better view into Apache.
> >>>> It would also raise our profile in Apache and thus make other
> >>>>
> >>> projects more aware of us.
> >>
> >>>
> >>>> However, I am concerned about loosing Pig's explicit connection to
> >>>>
> >>> Hadoop.
> >>
> >>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
> >>>> are the current flavor of the month in computing.  Given that Pig
> >>>> shares a name with the common farm animal, it's hard to be sure based
> >>>>
> >>> on search statistics.
> >>
> >>> But Google trends shows that "hadoop" is searched on much more
> >>>> frequently than "hadoop pig" or "apache pig" (see
> >>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
> >>>> that most Pig users come from Hadoop users who discover Pig via
> >>>>
> >>> Hadoop's website.
> >>
> >>> Loosing that subproject tab on Hadoop's front page may radically
> >>>> lower the number of users coming to Pig to check out our project.  I
> >>>> would argue that this benefits Hadoop as well, since high level
> >>>> languages like Pig Latin have the potential to greatly extend the
> >>>>
> >>> user base and usability of Hadoop.
> >>
> >>>
> >>>> Two, being explicitly connected to Hadoop keeps our two communities
> >>>> aware of each others needs.  There are features proposed for MR that
> >>>> would greatly help Pig.  By staying in the Hadoop community Pig is
> >>>> better positioned to advocate for and help implement and test those
> >>>> features.  The response to this will be that Pig developers can still
> >>>>
> >>>
> >>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
> >>>> they can still be part of the Hadoop community.  Which reinforces my
> >>>> point that it makes more sense to leave Pig in the Hadoop community
> >>>> since Pig developers will need to be part of that community anyway.
> >>>>
> >>>> Finally, philosophically it makes sense to me that projects that are
> >>>> tightly connected belong together.  It strikes me as strange to have
> >>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
> >>>> originally a subproject of Lucene.  It moved out to be a TLP when it
> >>>> became obvious that Hadoop had become independent of and useful apart
> >>>>
> >>>
> >>  from Lucene.  Pig is not in that position relative to Hadoop.
> >>>>
> >>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
> >>>> being persuaded that I'm wrong or my concerns can be addressed while
> >>>> still having Pig as a TLP.
> >>>>
> >>>> Alan.
> >>>>
> >>>>
> >>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
> >>>>
> >>>> You have probably heard by now that there is a discussion going on
> >>>> in the
> >>>>
> >>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro,
> >>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
> >>>>> umbrella and become top level Apache projects (TLP).  This
> >>>>> discussion has picked up recently since the Apache board has clearly
> >>>>>
> >>>>
> >>  communicated to the Hadoop PMC that it is concerned that Hadoop is
> >>>>> acting as an umbrella project with many disjoint subprojects
> >>>>> underneath it.  They are concerned that this gives Apache little
> >>>>> insight into the health and happenings of the subproject communities
> >>>>>
> >>>>
> >>  which in turn means Apache cannot properly mentor those communities.
> >>>>>
> >>>>> The purpose of this email is to start a discussion within the Pig
> >>>>> community about this topic.  Let me cover first what becoming TLP
> >>>>> would mean for Pig, and then I'll go into what options I think we
as
> >>>>>
> >>>> a community have.
> >>
> >>>
> >>>>> Becoming a TLP would mean that Pig would itself have a PMC that
> >>>>> would report directly to the Apache board.  Who would be on the
PMC
> >>>>> would be something we as a community would need to decide.  Common
> >>>>> options would be to say all active committers are on the PMC, or
all
> >>>>>
> >>>>
> >>  active committers who have been a committer for at least a year.  We
> >>>>>
> >>>>
> >>  would also need to elect a chair of the PMC.  This lucky person
> >>>>> would have no additional power, but would have the additional
> >>>>> responsibility of writing quarterly reports on Pig's status for
> >>>>> Apache board meetings, as well as coordinating with Apache to get
> >>>>> accounts for new  committers, etc.  For more information see
> >>>>> http://www.apache.org/foundation/how-it-works.html#roles
> >>>>>
> >>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop
> >>>>>
> >>>>
> >>  community.  We would continue to be invited to Hadoop Summits, HUGs,
> >>>>>
> >>>> etc.
> >>
> >>> Since all Pig developers and users are by definition Hadoop users,
> >>>>> we would continue to be a strong presence in the Hadoop community.
> >>>>>
> >>>>> I see three ways that we as a community can respond to this:
> >>>>>
> >>>>> 1) Say yes, we want to be a TLP now.
> >>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need more
> >>>>> time to mature.  If we choose this option we need to be able to
> >>>>> clearly articulate how much time we need and what we hope to see
> >>>>> change in that time.
> >>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
> >>>>> the drawbacks of being a disjoint subproject.  If we choose this,
we
> >>>>>
> >>>>
> >>  need to be able to say exactly what those benefits are and why we
> >>>>> feel they will be compromised by leaving the Hadoop project.
> >>>>>
> >>>>> There may other options that I haven't thought of.  Please feel
free
> >>>>>
> >>>>
> >>  to suggest any you think of.
> >>>>>
> >>>>> Questions?  Thoughts?  Let the discussion begin.
> >>>>>
> >>>>> Alan.
> >>>>>
> >>>>>
> >>>>>
> >>>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message