hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hc busy <hc.b...@gmail.com>
Subject Re: Begin a discussion about Pig as a top level project
Date Mon, 05 Apr 2010 18:51:08 GMT
I guess this is more of a suggestion for roadmap than TLP discussion, I
think the PMC/committers can create a dedicate position what maintains the
web/doc's. Somebody who yell and screams until the doc's are in sync with
the implementation before the release.

Because TLP is an elevation of status in addition to internal
re-organization. I think it might to create the PR needed to attract the
talents to fill in that job...


On Mon, Apr 5, 2010 at 11:23 AM, Alan Gates <gates@yahoo-inc.com> wrote:

> I agree that Pig's code documentation is in sad shape.  I think our user
> documentation for each release is good, of limited.  I hope that our
> documents on wiki (such as PigJournal) help people understand our roadmap.
>  Please let us know if you disagree so we can find ways to improve it.
>
> That said, it isn't clear to me how Pig being a TLP will solve that.  The
> current committers or some subset thereof (see original message) would
> become the PMC.  Other than having expanded powers to vote on releases and
> who becomes new committers, the role of these new PMC members would not
> change much.  They won't have anymore time to address documentation and
> communication issues.  We need to find a way to address those no matter what
> governance framework or community Pig is in.
>
> Alan.
>
>
> On Apr 5, 2010, at 9:02 AM, hc busy wrote:
>
>  This is awesome!!! As much as I hate PJM's for wasting time at all the
>> places that I've worked at, I think formalizing the management group(PMC)
>> to
>> openly and clearly determine feature roadmap and dev schedule is the best
>> thing pig can have.
>>
>> I once commented to my co-worker (also heavy pig user) that pig's
>> organization (with all due respect to all you hardworking people) is like
>> a
>> pigsty! documentations all over the place, javadocs from three versions
>> ago,
>> much of the documentation doesn't match actual features... links to the
>> download page is broken.
>>
>> If you look at cascading's website... it's so much cleaner. (Of course...
>> we
>> still use pig because it works well)
>>
>> I think as TLP, pig will receive better marketing and better support in a
>> way that will propel it both in popularity and in the amount of support it
>> receives.
>>
>> As a user, that change will be good for me.
>>
>>
>> On Sun, Apr 4, 2010 at 11:10 PM, Ashutosh Chauhan <
>> ashutosh.chauhan@gmail.com> wrote:
>>
>>  I concur with Santhosh here. I think main question we need to answer
>>> here is how close our ties are with Hadoop currently and how it will
>>> be in future ? When Pig was originally designed the intent was to keep
>>> it backend neutral, so  much so that there was a reference backend
>>> implementation (also known as local engine) which had nothing to do
>>> with Hadoop. But things have changed since then. Hadoop's local mode
>>> is adopted in favor of Pig's own local mode. We have moved from being
>>> backend agnostic to hadoop favoring. And while this was happening, it
>>> seems we tried to keep Pig Latin language independent of hadoop
>>> backend  while Pig runtime started to make use of hadoop concepts.
>>>
>>> Apart from design decisions, this move also has a practical impact on
>>> our codebase. Since we adopted Hadoop more closely, we got rid of an
>>> extra layer of abstraction and instead started using similar
>>> abstractions already existing in Hadoop. This has a positive impact
>>> that it simplified the codebase and provides tighter integration with
>>> Hadoop.
>>> So, if we are continuing in a direction where Hadoop is our only
>>> backend (or atleast a favored one), close ties to Hadoop are useful
>>> because of the reasons Alan and Dmitriy pointed out. if not, then I
>>> think moving out to TLP makes sense. Since, there is no efforts which
>>> I am aware of, is trying to plug in a different backend for Pig, I
>>> think maintaining close ties with Hadoop is useful for Pig. In future
>>> when there is a different distributed computing platform comes up
>>> which we want to use as backend, we can revisit our decision. So, as
>>> for things stand today I am -1 to move out of  Hadoop.
>>>
>>> And I would also like to reiterate my point that though Pig runtime
>>> may continue to get closer to Hadoop, we shall keep Pig Latin
>>> completely backend agnostic.
>>>
>>> Ashutosh
>>>
>>> On Sat, Apr 3, 2010 at 12:43, Santhosh Srinivasan <sms@yahoo-inc.com>
>>> wrote:
>>>
>>>> I see this as a multi-part question. Looking back at some of the
>>>> significant roadmap/existential questions asked in the last 12 months, I
>>>> see the following:
>>>>
>>>> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
>>>> an email about this approximately 9 months ago)
>>>> 2. What is the approach to support backward compatibility in Pig (Alan
>>>> had sent an email about this 3 months ago)
>>>> 3. Should Pig be a TLP (the current email thread).
>>>>
>>>> Here is my take on answering the aforementioned questions.
>>>>
>>>> The initial philosophy of Pig was to be backend agnostic. It was
>>>> designed as a data flow language. Whenever a new language is designed,
>>>> the syntax and semantics of the language have to be laid out. The syntax
>>>> is usually captured in the form of a BNF grammar. The semantics are
>>>> defined by the language creators. Backward compatibility is then a
>>>> question of holding true to the syntax and semantics. With Pig, in
>>>> addition to the language, the Java APIs were exposed to customers to
>>>> implement UDFs (load/store/filter/grouping/row transformation etc),
>>>> provision looping since the language does not support looping constructs
>>>> and also support a programmatic mode of access. Backward compatibility
>>>> in this context is to support API versioning.
>>>>
>>>> Do we still intend to position as a data flow language that is backend
>>>> agnostic? If the answer is yes, then there is a strong case for making
>>>> Pig a TLP.
>>>>
>>>> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
>>>> Hadoop sub-project was to ride the Hadoop popularity wave. As a
>>>> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>>>>
>>>> Like a good lawyer, I also have rebuttals to Alan's questions :)
>>>>
>>>> 1. Search engine popularity - We can discuss this with the Hadoop team
>>>> and still retain links to TLP's that are coupled (loosely or tightly).
>>>> 2. Explicit connection to Hadoop - I see this as logical connection v/s
>>>> physical connection. Today, we are physically connected as a
>>>> sub-project. Becoming a TLP, will not increase/decrease our influence on
>>>> the Hadoop community (think Logical, Physical and MR Layers :)
>>>> 3. Philosophy - I have already talked about this. The tight coupling is
>>>> by choice. If Pig continues to be a data flow language with clear syntax
>>>> and semantics then someone can implement Pig on top of a different
>>>> backend. Do we intend to take this approach?
>>>>
>>>> I just wanted to offer a different opinion to this thread. I strongly
>>>> believe that we should think about the original philosophy. Will we have
>>>> a Pig standards committee that will decide on the changes to the
>>>> language (think C/C++) if there are multiple backend implementations?
>>>>
>>>> I will reserve my vote based on the outcome of the philosophy and
>>>> backward compatibility discussions. If we decide that Pig will be
>>>> treated and maintained like a true language with clear syntax and
>>>> semantics then we have a strong case to make it into a TLP. If not, we
>>>> should retain our existing ties to Hadoop and make Pig into a data flow
>>>> language for Hadoop.
>>>>
>>>> Santhosh
>>>>
>>>> -----Original Message-----
>>>> From: Thejas Nair [mailto:tejas@yahoo-inc.com]
>>>> Sent: Friday, April 02, 2010 4:08 PM
>>>> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
>>>> Subject: Re: Begin a discussion about Pig as a top level project
>>>>
>>>> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
>>>> heavily influenced by its roadmap. I think it makes sense to continue as
>>>> a sub-project of hadoop.
>>>>
>>>> -Thejas
>>>>
>>>>
>>>>
>>>> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" <dvryaboy@gmail.com> wrote:
>>>>
>>>>  Over time, Pig is increasing its coupling to Hadoop (for good
>>>>> reasons), rather than decreasing it. If and when Pig becomes a viable
>>>>> entity without hadoop around, it might make sense as a TLP. As is, I
>>>>> think becoming a TLP will only introduce unnecessary administrative
>>>>>
>>>> and bureaucratic headaches.
>>>>
>>>>> So my vote is also -1.
>>>>>
>>>>> -Dmitriy
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates <gates@yahoo-inc.com>
>>>>>
>>>> wrote:
>>>>
>>>>>
>>>>>  So far I haven't seen any feedback on this.  Apache has asked the
>>>>>> Hadoop PMC to submit input in April on whether some subprojects
>>>>>> should be promoted to TLPs.  We, the Pig community, need to give
>>>>>> feedback to the Hadoop PMC on how we feel about this.  Please make
>>>>>>
>>>>> your voice heard.
>>>>
>>>>>
>>>>>> So now I'll head my own call and give my thoughts on it.
>>>>>>
>>>>>> The biggest advantage I see to being a TLP is a direct connection
to
>>>>>> Apache.  Right now all of the Pig team's interaction with Apache
is
>>>>>> through the Hadoop PMC.  Being directly connected to Apache would
>>>>>> benefit Pig team members who would have a better view into Apache.
>>>>>> It would also raise our profile in Apache and thus make other
>>>>>>
>>>>> projects more aware of us.
>>>>
>>>>>
>>>>>> However, I am concerned about loosing Pig's explicit connection to
>>>>>>
>>>>> Hadoop.
>>>>
>>>>> This concern has a couple of dimensions.  One, Hadoop and MapReduce
>>>>>> are the current flavor of the month in computing.  Given that Pig
>>>>>> shares a name with the common farm animal, it's hard to be sure based
>>>>>>
>>>>> on search statistics.
>>>>
>>>>> But Google trends shows that "hadoop" is searched on much more
>>>>>> frequently than "hadoop pig" or "apache pig" (see
>>>>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig).  I am guessing
>>>>>> that most Pig users come from Hadoop users who discover Pig via
>>>>>>
>>>>> Hadoop's website.
>>>>
>>>>> Loosing that subproject tab on Hadoop's front page may radically
>>>>>> lower the number of users coming to Pig to check out our project.
 I
>>>>>> would argue that this benefits Hadoop as well, since high level
>>>>>> languages like Pig Latin have the potential to greatly extend the
>>>>>>
>>>>> user base and usability of Hadoop.
>>>>
>>>>>
>>>>>> Two, being explicitly connected to Hadoop keeps our two communities
>>>>>> aware of each others needs.  There are features proposed for MR that
>>>>>> would greatly help Pig.  By staying in the Hadoop community Pig is
>>>>>> better positioned to advocate for and help implement and test those
>>>>>> features.  The response to this will be that Pig developers can still
>>>>>>
>>>>>
>>>>  subscribe to Hadoop mailing lists, submit patches, etc.  That is,
>>>>>> they can still be part of the Hadoop community.  Which reinforces
my
>>>>>> point that it makes more sense to leave Pig in the Hadoop community
>>>>>> since Pig developers will need to be part of that community anyway.
>>>>>>
>>>>>> Finally, philosophically it makes sense to me that projects that
are
>>>>>> tightly connected belong together.  It strikes me as strange to have
>>>>>> Pig as a TLP completely dependent on another TLP.  Hadoop was
>>>>>> originally a subproject of Lucene.  It moved out to be a TLP when
it
>>>>>> became obvious that Hadoop had become independent of and useful apart
>>>>>>
>>>>>
>>>>  from Lucene.  Pig is not in that position relative to Hadoop.
>>>>>>
>>>>>> So, I'm -1 on Pig moving out.  But this is a soft -1.  I'm open to
>>>>>> being persuaded that I'm wrong or my concerns can be addressed while
>>>>>> still having Pig as a TLP.
>>>>>>
>>>>>> Alan.
>>>>>>
>>>>>>
>>>>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote:
>>>>>>
>>>>>> You have probably heard by now that there is a discussion going on
>>>>>> in the
>>>>>>
>>>>>>> Hadoop PMC as to whether a number of the subprojects (Hbase,
Avro,
>>>>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop
>>>>>>> umbrella and become top level Apache projects (TLP).  This
>>>>>>> discussion has picked up recently since the Apache board has
clearly
>>>>>>>
>>>>>>
>>>>  communicated to the Hadoop PMC that it is concerned that Hadoop is
>>>>>>> acting as an umbrella project with many disjoint subprojects
>>>>>>> underneath it.  They are concerned that this gives Apache little
>>>>>>> insight into the health and happenings of the subproject communities
>>>>>>>
>>>>>>
>>>>  which in turn means Apache cannot properly mentor those communities.
>>>>>>>
>>>>>>> The purpose of this email is to start a discussion within the
Pig
>>>>>>> community about this topic.  Let me cover first what becoming
TLP
>>>>>>> would mean for Pig, and then I'll go into what options I think
we as
>>>>>>>
>>>>>> a community have.
>>>>
>>>>>
>>>>>>> Becoming a TLP would mean that Pig would itself have a PMC that
>>>>>>> would report directly to the Apache board.  Who would be on the
PMC
>>>>>>> would be something we as a community would need to decide.  Common
>>>>>>> options would be to say all active committers are on the PMC,
or all
>>>>>>>
>>>>>>
>>>>  active committers who have been a committer for at least a year.  We
>>>>>>>
>>>>>>
>>>>  would also need to elect a chair of the PMC.  This lucky person
>>>>>>> would have no additional power, but would have the additional
>>>>>>> responsibility of writing quarterly reports on Pig's status for
>>>>>>> Apache board meetings, as well as coordinating with Apache to
get
>>>>>>> accounts for new  committers, etc.  For more information see
>>>>>>> http://www.apache.org/foundation/how-it-works.html#roles
>>>>>>>
>>>>>>> Becoming a TLP would not mean that we are ostracized from the
Hadoop
>>>>>>>
>>>>>>
>>>>  community.  We would continue to be invited to Hadoop Summits, HUGs,
>>>>>>>
>>>>>> etc.
>>>>
>>>>> Since all Pig developers and users are by definition Hadoop users,
>>>>>>> we would continue to be a strong presence in the Hadoop community.
>>>>>>>
>>>>>>> I see three ways that we as a community can respond to this:
>>>>>>>
>>>>>>> 1) Say yes, we want to be a TLP now.
>>>>>>> 2) Say yes, we want to be a TLP, but not yet.  We feel we need
more
>>>>>>> time to mature.  If we choose this option we need to be able
to
>>>>>>> clearly articulate how much time we need and what we hope to
see
>>>>>>> change in that time.
>>>>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh
>>>>>>> the drawbacks of being a disjoint subproject.  If we choose this,
we
>>>>>>>
>>>>>>
>>>>  need to be able to say exactly what those benefits are and why we
>>>>>>> feel they will be compromised by leaving the Hadoop project.
>>>>>>>
>>>>>>> There may other options that I haven't thought of.  Please feel
free
>>>>>>>
>>>>>>
>>>>  to suggest any you think of.
>>>>>>>
>>>>>>> Questions?  Thoughts?  Let the discussion begin.
>>>>>>>
>>>>>>> Alan.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message