Return-Path: X-Original-To: apmail-incubator-general-archive@www.apache.org Delivered-To: apmail-incubator-general-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8B609DD3E for ; Wed, 20 Feb 2013 16:27:48 +0000 (UTC) Received: (qmail 83004 invoked by uid 500); 20 Feb 2013 16:27:47 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 82514 invoked by uid 500); 20 Feb 2013 16:27:46 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 82481 invoked by uid 99); 20 Feb 2013 16:27:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 16:27:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tucu@cloudera.com designates 209.85.223.174 as permitted sender) Received: from [209.85.223.174] (HELO mail-ie0-f174.google.com) (209.85.223.174) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Feb 2013 16:27:40 +0000 Received: by mail-ie0-f174.google.com with SMTP id k10so10136347iea.33 for ; Wed, 20 Feb 2013 08:27:19 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type:x-gm-message-state; bh=dbDpo91DvFuFce+GuJJozthcKV64zEZZFquUyiyaW1U=; b=oHfUPdpIdhs1LIktwvru2gdv2vXEQrjKI3vxegRcbWyXqnpvKUiVh0mi+a20YeriM0 1ow5YqFc7pr25QxJiIvJHJXOm6YbHokXzYGpMz2lajFbtOB29gTjsPiQ72ziE+bHIyYM clBCpKIQpO838Zi64o4LJWCZRuTo4q/hIUr39afJJpCXdvVLxIWicICwPEMlQpqkhi5i DkFI1LQgzPjXORikTob+zeGx5YAsnfLNNF9gHJIqyOUfHzwT5XtPTxvdi2XlEC+57Sqc 1meKdAZUIfNKH9mrBrMqhy1GrBpYHE5OGUCYXXErT1dWpOd6a85A77lo+EOOsJt5ltWx G/7g== X-Received: by 10.42.140.72 with SMTP id j8mr9322155icu.37.1361377639249; Wed, 20 Feb 2013 08:27:19 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.125.201 with HTTP; Wed, 20 Feb 2013 08:26:49 -0800 (PST) In-Reply-To: References: <8F1FC886-8A6F-49AE-8F48-61A22EA19209@hortonworks.com> From: Alejandro Abdelnur Date: Wed, 20 Feb 2013 08:26:49 -0800 Message-ID: Subject: Re: [VOTE] Accept Tez into Incubator To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8be897b77804d62a6ea2 X-Gm-Message-State: ALoCoQlBKnfUjfJLYTuEVbDaRG5T7l8bQ7fH3cbJFsUmBEIxFMm4kXCMG5CFEUgqzrSpnWlZzxa8 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e8be897b77804d62a6ea2 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable +1 (non-binding), glad to see that finally the idea of having a DAG AM is getting traction. Arun, would you please clarify how Tez is (conceptually) different from the Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178? On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah wrote= : > +1 ( non-binding ) > > -- Hitesh > > On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote: > > > Hi Folks, > > > > Thanks for participating in the discussion. I'd like to call a VOTE for > acceptance of Apache Tez into the Incubator. I'll let the vote run till > into this weekend (Sun 2/24 6pm PST). > > > > [ ] +1 Accept Apache Tez into the Incubator > > [ ] +0 Don't care. > > [ ] -1 Don't accept Apache Tez into the Incubator because... > > > > Full proposal is pasted at the bottom of this email, and the > corresponding wiki is http://wiki.apache.org/incubator/TezProposal. > > > > Only VOTEs from Incubator PMC members are binding, but all are welcome > to express their thoughts. > > > > Here's my +1 (binding). > > > > thanks, > > Arun > > > > PS: From the initial discussion, the only changes are that I've added > one new mentor and 2 new committers. All the new additions come from the > non-major employer while we continue to strive to further diversify durin= g > the incubation. Thanks. > > > > ---- > > > > =3D Tez =3D > > > > =3D=3D Abstract =3D=3D > > Tez is an effort to develop a generic application framework which can b= e > used > > to process arbitrarily complex data-processing tasks and also a > re-usable set > > of data-processing primitives which can be used by other projects. > > > > =3D=3D Proposal =3D=3D > > Tez is a proposal to develop a generic application which can be used to > > process complex data-processing task DAGs and runs natively on Apache > Hadoop > > YARN. YARN is a generic resource-management system on which currently > > applications like MapReduce already exist. MapReduce is a specific, and > > constrained, DAG - which is not optimal for several frameworks like > Apache Hive > > and Apache Pig. Furthermore, we propose to develop a re-usable set of > > libraries of data-processing primitives such as sorting, merging, > > data-shuffling, intermediate data management etc. which are necessary > for Tez > > which we envision can be used directly by other projects. > > > > =3D=3D Background =3D=3D > > Apache Hadoop MapReduce has emerged as the assembly-language on which > other > > frameworks like Apache Pig and Apache Hive have been built. However, it > has > > been well accepted that MapReduce produces very constrained task DAGs > for each > > job which results in Apache Pig and Apache Hive requiring multiple > MapReduce > > jobs for several queries. By providing a more expressive DAG of tasks > for a > > job, Tez attempts to provide significantly enhanced data-processing > > capabilities for projects like Apache Pig, Apache Hive, Cascading etc. > > > > =3D=3D Rationale =3D=3D > > There is an important gap that Tez fulfills in the Apache Hadoop > ecosystem of > > allowing for more expressive task DAGs for data-processing applications > such > > as Apache Pig, Apache Hive, Cascading etc. > > > > With emergence of Apache Hadoop YARN, there is a strong need for a > > common DAG application which can then be shared by Apache Pig, Apache > Hive, > > Cascading etc. > > > > =3D=3D Initial Goals =3D=3D > > The initial goals for this project are to specify the detailed > requirements > > and architecture, and then develop the initial implementation including > the > > DAG ApplicationMaster to run natively inside Apache Hadoop YARN. > > > > =3D=3D Current Status =3D=3D > > Significant work has been completed to identify the initial requirement= s > and > > define the overall system architecture. There is a patch available in t= he > > internal Hortonworks git repository which can act as the initial seed. > > > > =3D=3D=3D Meritocracy =3D=3D=3D > > We plan to invest in supporting a meritocracy. We will discuss the > requirements > > in an open forum. Several companies have already expressed interest in > this > > project, and we intend to invite additional developers to participate. > > We will encourage and monitor community participation so that privilege= s > can be > > extended to those that contribute. > > > > =3D=3D=3D Community =3D=3D=3D > > The need for a generic DAG application for data processing in the open > source is > > tremendous, so there is a potential for a very large community. We > believe > > that Tez's extensible architecture will further encourage community > participation. > > Also, related Apache projects (eg, Pig, Hive) have very large and activ= e > > communities, and we expect that over time Tez will also attract a large > community. > > > > =3D=3D=3D Core Developers =3D=3D=3D > > The developers on the initial committers list include people very > experienced > > in the Apache Hadoop ecosystem: > > > > * Alan Gates > > * Arun C Murthy > > * Ashutosh Chauhan > > * Bikas Saha > > * Chris Douglas > > * Daryn Sharp > > * Devaraj Das > > * Gopal Vijayaraghavan > > * Gunther Hagleitner > > * Hitesh Shah > > * Jason Lowe > > * Jean Xu > > * Jitendra Pandey > > * Julien Le Dem > > * Kevin Wilfong > > * Mike Liddell > > * Namit Jain > > * Nathan Roberts > > * Owen O'Malley > > * Robert Evans > > * Siddharth Seth > > * Tom White > > * Thomas Graves > > * Vikram Dixit > > * Vinod Kumar Vavilapalli > > * William Graham > > > > We realize that though we have significant employer diversity already, > > additional diversity is always better, and we will work > > aggressively to recruit developers from additional companies. > > > > =3D=3D=3D Alignment =3D=3D=3D > > The initial committers strongly believe that a standard task DAG > > application on Apache Hadoop YARN will gain broader adoption as an open > source, > > community driven project, where the community can contribute not only t= o > the > > core components, but also to a growing collection of applications which > will > > be based on top of Tez. Our hope is that the Apache Hive, Apache Pig, > > Cascading and other communities will find tremendous value in Tez and > will adopt > > it en masse. > > > > =3D=3D Known Risks =3D=3D > > > > =3D=3D=3D Orphaned Products =3D=3D=3D > > The contributors are leading users and vendors in the Apache Hadoop > ecosystem, > > with significant open source experience, so the risk of being orphaned = is > > relatively low. The project could be at risk if vendors decided to chan= ge > > their strategies in the market. In such an event, the current committer= s > > plan to continue working on the project on their own time, though the > > progress will likely be slower. We plan to mitigate this risk by > > recruiting additional committers. > > > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > > The initial committers include veteran Apache members (Committers, PMC > members > > and Apache Members) and other developers who have varying degrees of > experience > > with open source projects. All have been involved with source code that > has > > been released under an open source license, and several also have > experience > > developing code with an open source development process. > > > > =3D=3D=3D Homogenous Developers =3D=3D=3D > > The initial committers are employed by a number of companies, including > > Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are > committed > > to recruiting additional committers from other companies based on their > > contributions to the project even though we do have significant diversi= ty > > already. > > > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > > It is expected that Tez development will occur on both salaried time an= d > on > > volunteer time, after hours. The majority of initial committers are pai= d > by > > their employer to contribute to this project. However, they are all > passionate > > about the project, and we are confident that the project will continue > even if > > no salaried developers contribute to the project. We are committed to > recruiting > > additional committers including non-salaried developers. > > > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > > As mentioned in the Alignment section, Tez is closely integrated with > Hadoop, > > Hive and Pig in a numerous ways. We look forward to collaborating with > > those communities, as well as other Apache communities. > > > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > > Tez solves a real need for generic task DAG management in the Apache > Hadoop > > ecosystem, something which has been addressed in a very ad hoc manner s= o > far > > by multiple Apache projects. Our rationale for developing Tez as an > Apache > > project is detailed in the Rationale section. We believe that the Apach= e > brand > > and community process will help us attract more contributors to this > project, > > and help establish ubiquitous APIs. > > > > =3D=3D Documentation =3D=3D > > http://wiki.apache.org/incubator/TezProposal > > > > =3D=3D Initial Source =3D=3D > > Available as a patch. > > > > =3D=3D Cryptography =3D=3D > > Tez will eventually support encryption on the wire. This is not one of > the initial > > goals, and we do not expect Tez to be a controlled export item due to > the use > > of encryption. > > > > =3D=3D Required Resources =3D=3D > > > > =3D=3D=3D Mailing List =3D=3D=3D > > * tez-private > > * tez-dev > > * tez-user > > > > =3D=3D=3D Subversion Directory =3D=3D=3D > > Git is the preferred source control system: git://git.apache.org/tez > > > > =3D=3D=3D Issue Tracking =3D=3D=3D > > > > JIRA Tez (TEZ) > > > > =3D=3D Initial Committers =3D=3D > > * Alan Gates > > * Arun C Murthy > > * Ashutosh Chauhan > > * Bikas Saha > > * Chris Douglas > > * Daryn Sharp > > * Devaraj Das > > * Gopal Vijayaraghavan > > * Gunther Hagleitner > > * Hitesh Shah > > * Jason Lowe > > * Jean Xu > > * Jitendra Pandey > > * Julien Le Dem > > * Kevin Wilfong > > * Mike Liddell > > * Namit Jain > > * Nathan Roberts > > * Owen O'Malley > > * Robert Evans > > * Siddharth Seth > > * Tom White > > * Thomas Graves > > * Vikram Dixit > > * Vinod Kumar Vavilapalli > > * William Graham > > > > =3D=3D Affiliations =3D=3D > > The initial committers are employees of Cloudera, Facebook, Hortonworks= , > > Microsoft, Twitter and Yahoo Inc. > > > > * Alan Gates - Hortonworks > > * Arun C Murthy - Hortonworks > > * Ashutosh Chauhan - Hortonworks > > * Bikas Saha - Hortonworks > > * Chris Douglas - Microsoft > > * Daryn Sharp - Yahoo > > * Devaraj Das - Hortonworks > > * Gopal Vijayaraghavan - Hortonworks > > * Gunther Hagleitner - Hortonworks > > * Hitesh Shah - Hortonworks > > * Jason Lowe - Yahoo > > * Jean Xu - Facebook > > * Jitendra Pandey - Hortonworks > > * Julien Le Dem - Twitter > > * Kevin Wilfong - Facebook > > * Mike Liddell - Microsoft > > * Namit Jain - Facebook > > * Nathan Roberts - Yahoo > > * Owen O'Malley - Hortonworks > > * Robert Evans - Yahoo > > * Siddharth Seth - Hortonworks > > * Tom White - Cloudera > > * Thomas Graves - Yahoo > > * Vikram Dixit - Hortonworks > > * Vinod Kumar Vavilapalli - Hortonworks > > * William Graham - Twitter > > > > The nominated mentors are employees of Hortonworks, LinkedIn, > > NASA JPL and Microsoft. > > > > * Alan Gates - Hortonworks > > * Arun C Murthy - Hortonworks > > * Chris Douglas - Microsoft > > * Chris Mattman - NASA JPL > > * Jakob Homan - LinkedIn > > * Owen O'Malley - Hortonworks > > > > =3D=3D Sponsors =3D=3D > > > > =3D=3D=3D Champion =3D=3D=3D > > Arun C Murthy > > > > =3D=3D=3D Nominated Mentors =3D=3D=3D > > * Alan Gates =82=C4=EC Architect at Hortonwor= ks. > Committer for Pig. > > * Arun C Murthy =82=C4=EC Architect at > Hortonworks. Committer for Hadoop. > > * Chris Douglas - Sr. Research Engineer at > Microsoft. Committer for Hadoop. > > * Chris Mattman - Sr. Computer Scientist, > NASA JPL. Committer for Nutch, OODT and Tika. > > * Jakob Homan =82=C4=EC Sr. Software Engine= er, > LinkedIn. Committer for Hadoop, Kafka, Giraph. > > * Owen O'Malley =82=C4=EC Architect at > Hortonworks. Committer for Hadoop, Ambari. > > > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > > Incubator > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --=20 Alejandro --90e6ba6e8be897b77804d62a6ea2--