hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hc busy <hc.b...@gmail.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Mon, 05 Apr 2010 20:53:40 GMT
>The Twitter office is cushier and has more bars within stumbling
distance. Just sayin'.

and strip clubs too, I gather there're a couple on Market... near civic bart
stop ;-)

oh... hey, you guys are at a nice place... lot's of night clubs near there
too .


> "Given that, do you think it makes sense to say that Pig stays a
subproject for now, but if it someday grows beyond Hadoop only it becomes a
TLP?  I could agree to that stance."


Oops, I didn't read your whole message... I think TLP could be part of the
roadmap: Planned publicity, like planned pregnancy, is a good thing.

And on the way there, we should add dedicated resource that updates
documentation and links on the website... :-)




On Mon, Apr 5, 2010 at 12:10 PM, Dmitriy Ryaboy <dvryaboy@gmail.com> wrote:

> The Twitter office is cushier and has more bars within stumbling distance.
> Just sayin'.
>
> To the subject at hand -- I don't think TLP standing has the PR value you
> think it does... feature set, velocity of development, adoption,
> flexibility, etc -- those are far more important.
>
> -Dmitriy
>
> On Mon, Apr 5, 2010 at 11:58 AM, hc busy <hc.busy@gmail.com> wrote:
>
> > > Of course I'd love it if someday there is an ISO Pig Latin committee
> > (with
> > meetings in cool exotic places) deciding the official standard for Pig
> > Latin.
> >
> > haha!!! Some exotic place like Yahoo's  HQ in sunny Sunnyvale California?
> >
> > I guess it feels like it depends on the roadmap more than roadmap depends
> > on
> > it. In terms of positioning, a TLP would appear to potential users who
> are
> > evaluating alternatives to consider it as _the_ choice as opposed to one
> of
> > the choices. If the ambition is to take it there, then TLP, as useless as
> > it
> > may seem right now, might actually be worth the effort to attain.
> >
> > I mean, would you rather wait until Hive makes TLP and then play catch
> up?
> > I
> > mean, I can kinda see them doing that...
> >
> >
> >
> >
> > On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates <gates@yahoo-inc.com> wrote:
> >
> > > Prognostication is a difficult business.  Of course I'd love it if
> > someday
> > > there is an ISO Pig Latin committee (with meetings in cool exotic
> places)
> > > deciding the official standard for Pig Latin.  But that seems like
> saying
> > in
> > > your start up's business plan, "When we reach Google's size, then we'll
> > do
> > > x".  If there ever is an ISO Pig Latin standard it will be years off.
> > >
> > > As others have noted, staying tight to Hadoop now has many advantages,
> > both
> > > in technical and adoption terms.  Hence my advocacy of keeping Pig
> Latin
> > > Hadoop agnostic while tightly integrating the backend.  Which is to say
> > that
> > > in my view, Pig is Hadoop specific now, but there may come a day when
> > that
> > > is no longer true.   Whether Pig will ever move past just running on
> > Hadoop
> > > to running in other parallel systems won't be known for years to come.
> > >  Given that, do you think it makes sense to say that Pig stays a
> > subproject
> > > for now, but if it someday grows beyond Hadoop only it becomes a TLP?
>  I
> > > could agree to that stance.
> > >
> > > Alan.
> > >
> > >
> > > On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:
> > >
> > >  I see this as a multi-part question. Looking back at some of the
> > >> significant roadmap/existential questions asked in the last 12 months,
> I
> > >> see the following:
> > >>
> > >> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> > >> an email about this approximately 9 months ago)
> > >> 2. What is the approach to support backward compatibility in Pig (Alan
> > >> had sent an email about this 3 months ago)
> > >> 3. Should Pig be a TLP (the current email thread).
> > >>
> > >> Here is my take on answering the aforementioned questions.
> > >>
> > >> The initial philosophy of Pig was to be backend agnostic. It was
> > >> designed as a data flow language. Whenever a new language is designed,
> > >> the syntax and semantics of the language have to be laid out. The
> syntax
> > >> is usually captured in the form of a BNF grammar. The semantics are
> > >> defined by the language creators. Backward compatibility is then a
> > >> question of holding true to the syntax and semantics. With Pig, in
> > >> addition to the language, the Java APIs were exposed to customers to
> > >> implement UDFs (load/store/filter/grouping/row transformation etc),
> > >> provision looping since the language does not support looping
> constructs
> > >> and also support a programmatic mode of access. Backward compatibility
> > >> in this context is to support API versioning.
> > >>
> > >> Do we still intend to position as a data flow language that is backend
> > >> agnostic? If the answer is yes, then there is a strong case for making
> > >> Pig a TLP.
> > >>
> > >> Are we influenced by Hadoop? A big YES! The reason Pig chose to become
> a
> > >> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> > >> consequence, we chose to be heavily influenced by the Hadoop roadmap.
> > >>
> > >> Like a good lawyer, I also have rebuttals to Alan's questions :)
> > >>
> > >> 1. Search engine popularity - We can discuss this with the Hadoop team
> > >> and still retain links to TLP's that are coupled (loosely or tightly).
> > >> 2. Explicit connection to Hadoop - I see this as logical connection
> v/s
> > >> physical connection. Today, we are physically connected as a
> > >> sub-project. Becoming a TLP, will not increase/decrease our influence
> on
> > >> the Hadoop community (think Logical, Physical and MR Layers :)
> > >> 3. Philosophy - I have already talked about this. The tight coupling
> is
> > >> by choice. If Pig continues to be a data flow language with clear
> syntax
> > >> and semantics then someone can implement Pig on top of a different
> > >> backend. Do we intend to take this approach?
> > >>
> > >> I just wanted to offer a different opinion to this thread. I strongly
> > >> believe that we should think about the original philosophy. Will we
> have
> > >> a Pig standards committee that will decide on the changes to the
> > >> language (think C/C++) if there are multiple backend implementations?
> > >>
> > >> I will reserve my vote based on the outcome of the philosophy and
> > >> backward compatibility discussions. If we decide that Pig will be
> > >> treated and maintained like a true language with clear syntax and
> > >> semantics then we have a strong case to make it into a TLP. If not, we
> > >> should retain our existing ties to Hadoop and make Pig into a data
> flow
> > >> language for Hadoop.
> > >>
> > >> Santhosh
> > >>
> > >> -----Original Message-----
> > >> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
> > >> Sent: Friday, April 02, 2010 4:08 PM
> > >> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
> > >> Subject: Re: Begin a discussion about Pig as a top level project
> > >>
> > >> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop,
> and
> > >> heavily influenced by its roadmap. I think it makes sense to continue
> as
> > >> a sub-project of hadoop.
> > >>
> > >> -Thejas
> > >>
> > >>
> > >>
> > >> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:
> > >>
> > >>  Over time, Pig is increasing its coupling to Hadoop (for good
> > >>> reasons), rather than decreasing it. If and when Pig becomes a viable
> > >>> entity without hadoop around, it might make sense as a TLP. As is,
I
> > >>> think becoming a TLP will only introduce unnecessary administrative
> > >>>
> > >> and bureaucratic headaches.
> > >>
> > >>> So my vote is also -1.
> > >>>
> > >>> -Dmitriy
> > >>>
> > >>>
> > >>>
> > >>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com>
> > >>>
> > >> wrote:
> > >>
> > >>>
> > >>>  So far I haven't seen any feedback on this.  Apache has asked the
> > >>>> Hadoop PMC to submit input in April on whether some subprojects
> > >>>> should be promoted to TLPs.  We, the Pig community, need to give
> > >>>> feedback to the Hadoop PMC on how we feel about this.  Please make
> > >>>>
> > >>> your voice heard.
> > >>
> > >>>
> > >>>> So now I'll head my own call and give my thoughts on it.
> > >>>>
> > >>>> The biggest advantage I see to being a TLP is a direct connection
to
> > >>>> Apache.  Right now all of the Pig team's interaction with Apache
is
> > >>>> through the Hadoop PMC.  Being directly connected to Apache would
> > >>>> benefit Pig team members who would have a better view into Apache.
> > >>>> It would also raise our profile in Apache and thus make other
> > >>>>
> > >>> projects more aware of us.
> > >>
> > >>>
> > >>>> However, I am concerned about loosing Pig's explicit connection
to
> > >>>>
> > >>> Hadoop.
> > >>
> > >>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
> > >>>> are the current flavor of the month in computing.  Given that Pig
> > >>>> shares a name with the common farm animal, it's hard to be sure
> based
> > >>>>
> > >>> on search statistics.
> > >>
> > >>> But Google trends shows that "hadoop" is searched on much more
> > >>>> frequently than "hadoop pig" or "apache pig" (see
> > >>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
> > >>>> that most Pig users come from Hadoop users who discover Pig via
> > >>>>
> > >>> Hadoop's website.
> > >>
> > >>> Loosing that subproject tab on Hadoop's front page may radically
> > >>>> lower the number of users coming to Pig to check out our project.
 I
> > >>>> would argue that this benefits Hadoop as well, since high level
> > >>>> languages like Pig Latin have the potential to greatly extend the
> > >>>>
> > >>> user base and usability of Hadoop.
> > >>
> > >>>
> > >>>> Two, being explicitly connected to Hadoop keeps our two communities
> > >>>> aware of each others needs.  There are features proposed for MR
that
> > >>>> would greatly help Pig.  By staying in the Hadoop community Pig
is
> > >>>> better positioned to advocate for and help implement and test those
> > >>>> features.  The response to this will be that Pig developers can
> still
> > >>>>
> > >>>
> > >>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
> > >>>> they can still be part of the Hadoop community.  Which reinforces
my
> > >>>> point that it makes more sense to leave Pig in the Hadoop community
> > >>>> since Pig developers will need to be part of that community anyway.
> > >>>>
> > >>>> Finally, philosophically it makes sense to me that projects that
are
> > >>>> tightly connected belong together.  It strikes me as strange to
have
> > >>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
> > >>>> originally a subproject of Lucene.  It moved out to be a TLP when
it
> > >>>> became obvious that Hadoop had become independent of and useful
> apart
> > >>>>
> > >>>
> > >>  from Lucene.  Pig is not in that position relative to Hadoop.
> > >>>>
> > >>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open
to
> > >>>> being persuaded that I'm wrong or my concerns can be addressed
while
> > >>>> still having Pig as a TLP.
> > >>>>
> > >>>> Alan.
> > >>>>
> > >>>>
> > >>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
> > >>>>
> > >>>> You have probably heard by now that there is a discussion going
on
> > >>>> in the
> > >>>>
> > >>>>> Hadoop PMC as to whether a number of the subprojects (Hbase,
Avro,
> > >>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
> > >>>>> umbrella and become top level Apache projects (TLP).  This
> > >>>>> discussion has picked up recently since the Apache board has
> clearly
> > >>>>>
> > >>>>
> > >>  communicated to the Hadoop PMC that it is concerned that Hadoop is
> > >>>>> acting as an umbrella project with many disjoint subprojects
> > >>>>> underneath it.  They are concerned that this gives Apache little
> > >>>>> insight into the health and happenings of the subproject
> communities
> > >>>>>
> > >>>>
> > >>  which in turn means Apache cannot properly mentor those communities.
> > >>>>>
> > >>>>> The purpose of this email is to start a discussion within the
Pig
> > >>>>> community about this topic.  Let me cover first what becoming
TLP
> > >>>>> would mean for Pig, and then I'll go into what options I think
we
> as
> > >>>>>
> > >>>> a community have.
> > >>
> > >>>
> > >>>>> Becoming a TLP would mean that Pig would itself have a PMC
that
> > >>>>> would report directly to the Apache board.  Who would be on
the PMC
> > >>>>> would be something we as a community would need to decide.
 Common
> > >>>>> options would be to say all active committers are on the PMC,
or
> all
> > >>>>>
> > >>>>
> > >>  active committers who have been a committer for at least a year.  We
> > >>>>>
> > >>>>
> > >>  would also need to elect a chair of the PMC.  This lucky person
> > >>>>> would have no additional power, but would have the additional
> > >>>>> responsibility of writing quarterly reports on Pig's status
for
> > >>>>> Apache board meetings, as well as coordinating with Apache
to get
> > >>>>> accounts for new  committers, etc.  For more information see
> > >>>>> http://www.apache.org/foundation/how-it-works.html#roles
> > >>>>>
> > >>>>> Becoming a TLP would not mean that we are ostracized from the
> Hadoop
> > >>>>>
> > >>>>
> > >>  community.  We would continue to be invited to Hadoop Summits, HUGs,
> > >>>>>
> > >>>> etc.
> > >>
> > >>> Since all Pig developers and users are by definition Hadoop users,
> > >>>>> we would continue to be a strong presence in the Hadoop community.
> > >>>>>
> > >>>>> I see three ways that we as a community can respond to this:
> > >>>>>
> > >>>>> 1) Say yes, we want to be a TLP now.
> > >>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need
more
> > >>>>> time to mature.  If we choose this option we need to be able
to
> > >>>>> clearly articulate how much time we need and what we hope to
see
> > >>>>> change in that time.
> > >>>>> 3) Say no, we feel the benefits for us staying with Hadoop
outweigh
> > >>>>> the drawbacks of being a disjoint subproject.  If we choose
this,
> we
> > >>>>>
> > >>>>
> > >>  need to be able to say exactly what those benefits are and why we
> > >>>>> feel they will be compromised by leaving the Hadoop project.
> > >>>>>
> > >>>>> There may other options that I haven't thought of.  Please
feel
> free
> > >>>>>
> > >>>>
> > >>  to suggest any you think of.
> > >>>>>
> > >>>>> Questions?  Thoughts?  Let the discussion begin.
> > >>>>>
> > >>>>> Alan.
> > >>>>>
> > >>>>>
> > >>>>>
> > >>>>
> > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message