Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 80883 invoked from network); 5 Apr 2010 19:10:40 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 5 Apr 2010 19:10:40 -0000 Received: (qmail 514 invoked by uid 500); 5 Apr 2010 19:10:40 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 492 invoked by uid 500); 5 Apr 2010 19:10:40 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 484 invoked by uid 99); 5 Apr 2010 19:10:40 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 19:10:40 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dvryaboy@gmail.com designates 209.85.222.187 as permitted sender) Received: from [209.85.222.187] (HELO mail-pz0-f187.google.com) (209.85.222.187) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Apr 2010 19:10:33 +0000 Received: by pzk17 with SMTP id 17so3316788pzk.5 for ; Mon, 05 Apr 2010 12:10:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type; bh=NB+sQp2W+bbZUy/jHJsP2gCU9nve3w23yEsDgxTdZ+c=; b=VF6LrDZQ7LwQbwT49bH9sBGOBnCL4KNzc/pkLVAsUGAsEhRpQ62+ToBPRj19MXGUno PlWSuJVmdu6UECR+5wR+hQQ1CEoRPMnbhyuaFx849Dd6QK9VIYsSnopvKNog+vw1uEjF 70XXV8cYf1BenKSArFungqDK+194aJegcsiNE= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=S660ELC8bJGDzTzeTmcjt3ivb7O4CXVrEpCHPPmT4jtjl/GFr5aFW+5tQHYIoTNObf XqmdpXqsFcMBxCv6iYI92rjCRbfupFF7P8J/LHMNEOyI0fcO1hFoC31O40yhOLKP6ii1 nmyOro4saK/0EVHGdGgzBunvMDFNGZjAsC6xM= MIME-Version: 1.0 Received: by 10.142.186.20 with HTTP; Mon, 5 Apr 2010 12:10:10 -0700 (PDT) In-Reply-To: References: <088A0B616C8C1D4787DD686C6922A72A03161728@SNV-EXVS10.ds.corp.yahoo.com> <3CCCD20B-DEFC-4E01-94A3-ED5588AFB65F@yahoo-inc.com> Date: Mon, 5 Apr 2010 12:10:10 -0700 Received: by 10.142.121.1 with SMTP id t1mr2048895wfc.100.1270494611016; Mon, 05 Apr 2010 12:10:11 -0700 (PDT) Message-ID: Subject: Re: Begin a discussion about Pig as a top level project From: Dmitriy Ryaboy To: pig-dev@hadoop.apache.org Content-Type: multipart/alternative; boundary=001636e9083dfa8e8304838213fa X-Virus-Checked: Checked by ClamAV on apache.org --001636e9083dfa8e8304838213fa Content-Type: text/plain; charset=ISO-8859-1 The Twitter office is cushier and has more bars within stumbling distance. Just sayin'. To the subject at hand -- I don't think TLP standing has the PR value you think it does... feature set, velocity of development, adoption, flexibility, etc -- those are far more important. -Dmitriy On Mon, Apr 5, 2010 at 11:58 AM, hc busy wrote: > > Of course I'd love it if someday there is an ISO Pig Latin committee > (with > meetings in cool exotic places) deciding the official standard for Pig > Latin. > > haha!!! Some exotic place like Yahoo's HQ in sunny Sunnyvale California? > > I guess it feels like it depends on the roadmap more than roadmap depends > on > it. In terms of positioning, a TLP would appear to potential users who are > evaluating alternatives to consider it as _the_ choice as opposed to one of > the choices. If the ambition is to take it there, then TLP, as useless as > it > may seem right now, might actually be worth the effort to attain. > > I mean, would you rather wait until Hive makes TLP and then play catch up? > I > mean, I can kinda see them doing that... > > > > > On Mon, Apr 5, 2010 at 11:36 AM, Alan Gates wrote: > > > Prognostication is a difficult business. Of course I'd love it if > someday > > there is an ISO Pig Latin committee (with meetings in cool exotic places) > > deciding the official standard for Pig Latin. But that seems like saying > in > > your start up's business plan, "When we reach Google's size, then we'll > do > > x". If there ever is an ISO Pig Latin standard it will be years off. > > > > As others have noted, staying tight to Hadoop now has many advantages, > both > > in technical and adoption terms. Hence my advocacy of keeping Pig Latin > > Hadoop agnostic while tightly integrating the backend. Which is to say > that > > in my view, Pig is Hadoop specific now, but there may come a day when > that > > is no longer true. Whether Pig will ever move past just running on > Hadoop > > to running in other parallel systems won't be known for years to come. > > Given that, do you think it makes sense to say that Pig stays a > subproject > > for now, but if it someday grows beyond Hadoop only it becomes a TLP? I > > could agree to that stance. > > > > Alan. > > > > > > On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote: > > > > I see this as a multi-part question. Looking back at some of the > >> significant roadmap/existential questions asked in the last 12 months, I > >> see the following: > >> > >> 1. With the introduction of SQL, what is the philosophy of Pig (I sent > >> an email about this approximately 9 months ago) > >> 2. What is the approach to support backward compatibility in Pig (Alan > >> had sent an email about this 3 months ago) > >> 3. Should Pig be a TLP (the current email thread). > >> > >> Here is my take on answering the aforementioned questions. > >> > >> The initial philosophy of Pig was to be backend agnostic. It was > >> designed as a data flow language. Whenever a new language is designed, > >> the syntax and semantics of the language have to be laid out. The syntax > >> is usually captured in the form of a BNF grammar. The semantics are > >> defined by the language creators. Backward compatibility is then a > >> question of holding true to the syntax and semantics. With Pig, in > >> addition to the language, the Java APIs were exposed to customers to > >> implement UDFs (load/store/filter/grouping/row transformation etc), > >> provision looping since the language does not support looping constructs > >> and also support a programmatic mode of access. Backward compatibility > >> in this context is to support API versioning. > >> > >> Do we still intend to position as a data flow language that is backend > >> agnostic? If the answer is yes, then there is a strong case for making > >> Pig a TLP. > >> > >> Are we influenced by Hadoop? A big YES! The reason Pig chose to become a > >> Hadoop sub-project was to ride the Hadoop popularity wave. As a > >> consequence, we chose to be heavily influenced by the Hadoop roadmap. > >> > >> Like a good lawyer, I also have rebuttals to Alan's questions :) > >> > >> 1. Search engine popularity - We can discuss this with the Hadoop team > >> and still retain links to TLP's that are coupled (loosely or tightly). > >> 2. Explicit connection to Hadoop - I see this as logical connection v/s > >> physical connection. Today, we are physically connected as a > >> sub-project. Becoming a TLP, will not increase/decrease our influence on > >> the Hadoop community (think Logical, Physical and MR Layers :) > >> 3. Philosophy - I have already talked about this. The tight coupling is > >> by choice. If Pig continues to be a data flow language with clear syntax > >> and semantics then someone can implement Pig on top of a different > >> backend. Do we intend to take this approach? > >> > >> I just wanted to offer a different opinion to this thread. I strongly > >> believe that we should think about the original philosophy. Will we have > >> a Pig standards committee that will decide on the changes to the > >> language (think C/C++) if there are multiple backend implementations? > >> > >> I will reserve my vote based on the outcome of the philosophy and > >> backward compatibility discussions. If we decide that Pig will be > >> treated and maintained like a true language with clear syntax and > >> semantics then we have a strong case to make it into a TLP. If not, we > >> should retain our existing ties to Hadoop and make Pig into a data flow > >> language for Hadoop. > >> > >> Santhosh > >> > >> -----Original Message----- > >> From: Thejas Nair [mailto:tejas@yahoo-inc.com] > >> Sent: Friday, April 02, 2010 4:08 PM > >> To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy > >> Subject: Re: Begin a discussion about Pig as a top level project > >> > >> I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and > >> heavily influenced by its roadmap. I think it makes sense to continue as > >> a sub-project of hadoop. > >> > >> -Thejas > >> > >> > >> > >> On 3/31/10 4:04 PM, "Dmitriy Ryaboy" wrote: > >> > >> Over time, Pig is increasing its coupling to Hadoop (for good > >>> reasons), rather than decreasing it. If and when Pig becomes a viable > >>> entity without hadoop around, it might make sense as a TLP. As is, I > >>> think becoming a TLP will only introduce unnecessary administrative > >>> > >> and bureaucratic headaches. > >> > >>> So my vote is also -1. > >>> > >>> -Dmitriy > >>> > >>> > >>> > >>> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates > >>> > >> wrote: > >> > >>> > >>> So far I haven't seen any feedback on this. Apache has asked the > >>>> Hadoop PMC to submit input in April on whether some subprojects > >>>> should be promoted to TLPs. We, the Pig community, need to give > >>>> feedback to the Hadoop PMC on how we feel about this. Please make > >>>> > >>> your voice heard. > >> > >>> > >>>> So now I'll head my own call and give my thoughts on it. > >>>> > >>>> The biggest advantage I see to being a TLP is a direct connection to > >>>> Apache. Right now all of the Pig team's interaction with Apache is > >>>> through the Hadoop PMC. Being directly connected to Apache would > >>>> benefit Pig team members who would have a better view into Apache. > >>>> It would also raise our profile in Apache and thus make other > >>>> > >>> projects more aware of us. > >> > >>> > >>>> However, I am concerned about loosing Pig's explicit connection to > >>>> > >>> Hadoop. > >> > >>> This concern has a couple of dimensions. One, Hadoop and MapReduce > >>>> are the current flavor of the month in computing. Given that Pig > >>>> shares a name with the common farm animal, it's hard to be sure based > >>>> > >>> on search statistics. > >> > >>> But Google trends shows that "hadoop" is searched on much more > >>>> frequently than "hadoop pig" or "apache pig" (see > >>>> http://www.google.com/trends?q=hadoop%2Chadoop+pig). I am guessing > >>>> that most Pig users come from Hadoop users who discover Pig via > >>>> > >>> Hadoop's website. > >> > >>> Loosing that subproject tab on Hadoop's front page may radically > >>>> lower the number of users coming to Pig to check out our project. I > >>>> would argue that this benefits Hadoop as well, since high level > >>>> languages like Pig Latin have the potential to greatly extend the > >>>> > >>> user base and usability of Hadoop. > >> > >>> > >>>> Two, being explicitly connected to Hadoop keeps our two communities > >>>> aware of each others needs. There are features proposed for MR that > >>>> would greatly help Pig. By staying in the Hadoop community Pig is > >>>> better positioned to advocate for and help implement and test those > >>>> features. The response to this will be that Pig developers can still > >>>> > >>> > >> subscribe to Hadoop mailing lists, submit patches, etc. That is, > >>>> they can still be part of the Hadoop community. Which reinforces my > >>>> point that it makes more sense to leave Pig in the Hadoop community > >>>> since Pig developers will need to be part of that community anyway. > >>>> > >>>> Finally, philosophically it makes sense to me that projects that are > >>>> tightly connected belong together. It strikes me as strange to have > >>>> Pig as a TLP completely dependent on another TLP. Hadoop was > >>>> originally a subproject of Lucene. It moved out to be a TLP when it > >>>> became obvious that Hadoop had become independent of and useful apart > >>>> > >>> > >> from Lucene. Pig is not in that position relative to Hadoop. > >>>> > >>>> So, I'm -1 on Pig moving out. But this is a soft -1. I'm open to > >>>> being persuaded that I'm wrong or my concerns can be addressed while > >>>> still having Pig as a TLP. > >>>> > >>>> Alan. > >>>> > >>>> > >>>> On Mar 19, 2010, at 10:59 AM, Alan Gates wrote: > >>>> > >>>> You have probably heard by now that there is a discussion going on > >>>> in the > >>>> > >>>>> Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, > >>>>> Zookeeper, Hive, and Pig) should move out from under the Hadoop > >>>>> umbrella and become top level Apache projects (TLP). This > >>>>> discussion has picked up recently since the Apache board has clearly > >>>>> > >>>> > >> communicated to the Hadoop PMC that it is concerned that Hadoop is > >>>>> acting as an umbrella project with many disjoint subprojects > >>>>> underneath it. They are concerned that this gives Apache little > >>>>> insight into the health and happenings of the subproject communities > >>>>> > >>>> > >> which in turn means Apache cannot properly mentor those communities. > >>>>> > >>>>> The purpose of this email is to start a discussion within the Pig > >>>>> community about this topic. Let me cover first what becoming TLP > >>>>> would mean for Pig, and then I'll go into what options I think we as > >>>>> > >>>> a community have. > >> > >>> > >>>>> Becoming a TLP would mean that Pig would itself have a PMC that > >>>>> would report directly to the Apache board. Who would be on the PMC > >>>>> would be something we as a community would need to decide. Common > >>>>> options would be to say all active committers are on the PMC, or all > >>>>> > >>>> > >> active committers who have been a committer for at least a year. We > >>>>> > >>>> > >> would also need to elect a chair of the PMC. This lucky person > >>>>> would have no additional power, but would have the additional > >>>>> responsibility of writing quarterly reports on Pig's status for > >>>>> Apache board meetings, as well as coordinating with Apache to get > >>>>> accounts for new committers, etc. For more information see > >>>>> http://www.apache.org/foundation/how-it-works.html#roles > >>>>> > >>>>> Becoming a TLP would not mean that we are ostracized from the Hadoop > >>>>> > >>>> > >> community. We would continue to be invited to Hadoop Summits, HUGs, > >>>>> > >>>> etc. > >> > >>> Since all Pig developers and users are by definition Hadoop users, > >>>>> we would continue to be a strong presence in the Hadoop community. > >>>>> > >>>>> I see three ways that we as a community can respond to this: > >>>>> > >>>>> 1) Say yes, we want to be a TLP now. > >>>>> 2) Say yes, we want to be a TLP, but not yet. We feel we need more > >>>>> time to mature. If we choose this option we need to be able to > >>>>> clearly articulate how much time we need and what we hope to see > >>>>> change in that time. > >>>>> 3) Say no, we feel the benefits for us staying with Hadoop outweigh > >>>>> the drawbacks of being a disjoint subproject. If we choose this, we > >>>>> > >>>> > >> need to be able to say exactly what those benefits are and why we > >>>>> feel they will be compromised by leaving the Hadoop project. > >>>>> > >>>>> There may other options that I haven't thought of. Please feel free > >>>>> > >>>> > >> to suggest any you think of. > >>>>> > >>>>> Questions? Thoughts? Let the discussion begin. > >>>>> > >>>>> Alan. > >>>>> > >>>>> > >>>>> > >>>> > >> > > > --001636e9083dfa8e8304838213fa--