incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@hortonworks.com>
Subject Re: [VOTE] Accept Tez into Incubator
Date Fri, 22 Feb 2013 03:10:33 GMT
+1 (binding).


On Feb 21, 2013, at 1:41 PM, Olivier Lamy wrote:

> +1
> 
> 2013/2/20 Arun C Murthy <acm@hortonworks.com>:
>> Hi Folks,
>> 
>> Thanks for participating in the discussion. I'd like to call a VOTE for acceptance
of Apache Tez into the Incubator. I'll let the vote run till into this weekend (Sun 2/24 6pm
PST).
>> 
>> [ ]  +1 Accept Apache Tez into the Incubator
>> [ ]  +0 Don't care.
>> [ ]  -1 Don't accept Apache Tez into the Incubator because...
>> 
>> Full proposal is pasted at the bottom of this email, and the corresponding wiki is
http://wiki.apache.org/incubator/TezProposal.
>> 
>> Only VOTEs from Incubator PMC members are binding, but all are welcome to express
their thoughts.
>> 
>> Here's my +1 (binding).
>> 
>> thanks,
>> Arun
>> 
>> PS: From the initial discussion, the only changes are that I've added one new mentor
and 2 new committers. All the new additions come from the non-major employer while we continue
to strive to further diversify during the incubation. Thanks.
>> 
>> ----
>> 
>> = Tez =
>> 
>> == Abstract ==
>> Tez is an effort to develop a generic application framework which can be used
>> to process arbitrarily complex data-processing tasks and also a re-usable set
>> of data-processing primitives which can be used by other projects.
>> 
>> == Proposal ==
>> Tez is a proposal to develop a generic application which can be used to
>> process complex data-processing task DAGs and runs natively on Apache Hadoop
>> YARN. YARN is a generic resource-management system on which currently
>> applications like MapReduce already exist. MapReduce is a specific, and
>> constrained, DAG - which is not optimal for several frameworks like Apache Hive
>> and Apache Pig. Furthermore, we propose to develop a re-usable set of
>> libraries of data-processing primitives such as sorting, merging,
>> data-shuffling, intermediate data management etc. which are necessary for Tez
>> which we envision can be used directly by other projects.
>> 
>> == Background ==
>> Apache Hadoop MapReduce has emerged as the assembly-language on which other
>> frameworks like Apache Pig and Apache Hive have been built. However, it has
>> been well accepted that MapReduce produces very constrained task DAGs for each
>> job which results in Apache Pig and Apache Hive requiring multiple MapReduce
>> jobs for several queries. By providing a more expressive DAG of tasks for a
>> job, Tez attempts to provide significantly enhanced data-processing
>> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
>> 
>> == Rationale ==
>> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
>> allowing for more expressive task DAGs for data-processing applications such
>> as Apache Pig, Apache Hive, Cascading etc.
>> 
>> With emergence of Apache Hadoop YARN, there is a strong need for a
>> common DAG application which can then be shared by Apache Pig, Apache Hive,
>> Cascading etc.
>> 
>> == Initial Goals ==
>> The initial goals for this project are to specify the detailed requirements
>> and architecture, and then develop the initial implementation including the
>> DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
>> 
>> == Current Status ==
>> Significant work has been completed to identify the initial requirements and
>> define the overall system architecture. There is a patch available in the
>> internal Hortonworks git repository which can act as the initial seed.
>> 
>> === Meritocracy ===
>> We plan to invest in supporting a meritocracy. We will discuss the requirements
>> in an open forum. Several companies have already expressed interest in this
>> project, and we intend to invite additional developers to participate.
>> We will encourage and monitor community participation so that privileges can be
>> extended to those that contribute.
>> 
>> === Community ===
>> The need for a generic DAG application for data processing in the open source is
>> tremendous, so there is a potential for a very large community. We believe
>> that Tez's extensible architecture will further encourage community participation.
>> Also, related Apache projects (eg, Pig, Hive) have very large and active
>> communities, and we expect that over time Tez will also attract a large community.
>> 
>> === Core Developers ===
>> The developers on the initial committers list include people very experienced
>> in the Apache Hadoop ecosystem:
>> 
>> * Alan Gates <gates at apache dot org>
>> * Arun C Murthy <acmurthy at apache dot org>
>> * Ashutosh Chauhan <hashutosh at apache dot org>
>> * Bikas Saha <bikas at apache dot org>
>> * Chris Douglas <cdouglas at apache dot org>
>> * Daryn Sharp <daryn at apache dot org>
>> * Devaraj Das <ddas at apache dot org>
>> * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>> * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>> * Hitesh Shah <hitesh at apache dot org>
>> * Jason Lowe <jlowe at apache dot org>
>> * Jean Xu <jeanxu at facebook dot com>
>> * Jitendra Pandey <jitendra at apache dot org>
>> * Julien Le Dem <julien at apache dot org>
>> * Kevin Wilfong <kevinwilfong at apache dot org>
>> * Mike Liddell <mike dot lidell at microsoft dot com>
>> * Namit Jain <namit at apache dot org>
>> * Nathan Roberts <nroberts at yahoo dash inc dot com>
>> * Owen O'Malley <omalley at apache dot org>
>> * Robert Evans <bobby at apache dot org>
>> * Siddharth Seth <sseth at apache dot org>
>> * Tom White <tomwhite at apache dot org>
>> * Thomas Graves <tgraves at apache dot org>
>> * Vikram Dixit <vikram at apache dot org>
>> * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>> * William Graham <billgraham at apache dot org>
>> 
>> We realize that though we have significant employer diversity already,
>> additional diversity is always better, and we will work
>> aggressively to recruit developers from additional companies.
>> 
>> === Alignment ===
>> The initial committers strongly believe that a standard task DAG
>> application on Apache Hadoop YARN will gain broader adoption as an open source,
>> community driven project, where the community can contribute not only to the
>> core components, but also to a growing collection of applications which will
>> be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
>> Cascading and other communities will find tremendous value in Tez and will adopt
>> it en masse.
>> 
>> == Known Risks ==
>> 
>> === Orphaned Products ===
>> The contributors are leading users and vendors in the Apache Hadoop ecosystem,
>> with significant open source experience, so the risk of being orphaned is
>> relatively low. The project could be at risk if vendors decided to change
>> their strategies in the market. In such an event, the current committers
>> plan to continue working on the project on their own time, though the
>> progress will likely be slower. We plan to mitigate this risk by
>> recruiting additional committers.
>> 
>> === Inexperience with Open Source ===
>> The initial committers include veteran Apache members (Committers, PMC members
>> and Apache Members) and other developers who have varying degrees of experience
>> with open source projects. All have been involved with source code that has
>> been released under an open source license, and several also have experience
>> developing code with an open source development process.
>> 
>> === Homogenous Developers ===
>> The initial committers are employed by a number of companies, including
>> Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are committed
>> to recruiting additional committers from other companies based on their
>> contributions to the project even though we do have significant diversity
>> already.
>> 
>> === Reliance on Salaried Developers ===
>> It is expected that Tez development will occur on both salaried time and on
>> volunteer time, after hours. The majority of initial committers are paid by
>> their employer to contribute to this project. However, they are all passionate
>> about the project, and we are confident that the project will continue even if
>> no salaried developers contribute to the project. We are committed to recruiting
>> additional committers including non-salaried developers.
>> 
>> === Relationships with Other Apache Products ===
>> As mentioned in the Alignment section, Tez is closely integrated with Hadoop,
>> Hive and Pig in a numerous ways. We look forward to collaborating with
>> those communities, as well as other Apache communities.
>> 
>> === An Excessive Fascination with the Apache Brand ===
>> Tez solves a real need for generic task DAG management in the Apache Hadoop
>> ecosystem, something which has been addressed in a very ad hoc manner so far
>> by multiple Apache projects. Our rationale for developing Tez as an Apache
>> project is detailed in the Rationale section. We believe that the Apache brand
>> and community process will help us attract more contributors to this project,
>> and help establish ubiquitous APIs.
>> 
>> == Documentation ==
>> http://wiki.apache.org/incubator/TezProposal
>> 
>> == Initial Source ==
>> Available as a patch.
>> 
>> == Cryptography ==
>> Tez will eventually support encryption on the wire. This is not one of the initial
>> goals, and we do not expect Tez to be a controlled export item due to the use
>> of encryption.
>> 
>> == Required Resources ==
>> 
>> === Mailing List ===
>> * tez-private
>> * tez-dev
>> * tez-user
>> 
>> === Subversion Directory ===
>> Git is the preferred source control system: git://git.apache.org/tez
>> 
>> === Issue Tracking ===
>> 
>> JIRA Tez (TEZ)
>> 
>> == Initial Committers ==
>> * Alan Gates <gates at apache dot org>
>> * Arun C Murthy <acmurthy at apache dot org>
>> * Ashutosh Chauhan <hashutosh at apache dot org>
>> * Bikas Saha <bikas at apache dot org>
>> * Chris Douglas <cdouglas at apache dot org>
>> * Daryn Sharp <daryn at apache dot org>
>> * Devaraj Das <ddas at apache dot org>
>> * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>> * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>> * Hitesh Shah <hitesh at apache dot org>
>> * Jason Lowe <jlowe at apache dot org>
>> * Jean Xu <jeanxu at facebook dot com>
>> * Jitendra Pandey <jitendra at apache dot org>
>> * Julien Le Dem <julien at apache dot org>
>> * Kevin Wilfong <kevinwilfong at apache dot org>
>> * Mike Liddell <mike dot lidell at microsoft dot com>
>> * Namit Jain <namit at apache dot org>
>> * Nathan Roberts <nroberts at yahoo dash inc dot com>
>> * Owen O'Malley <omalley at apache dot org>
>> * Robert Evans <bobby at apache dot org>
>> * Siddharth Seth <sseth at apache dot org>
>> * Tom White <tomwhite at apache dot org>
>> * Thomas Graves <tgraves at apache dot org>
>> * Vikram Dixit <vikram at apache dot org>
>> * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>> * William Graham <billgraham at apache dot org>
>> 
>> == Affiliations ==
>> The initial committers are employees of Cloudera, Facebook, Hortonworks,
>> Microsoft, Twitter and Yahoo Inc.
>> 
>> * Alan Gates - Hortonworks
>> * Arun C Murthy - Hortonworks
>> * Ashutosh Chauhan - Hortonworks
>> * Bikas Saha - Hortonworks
>> * Chris Douglas - Microsoft
>> * Daryn Sharp - Yahoo
>> * Devaraj Das - Hortonworks
>> * Gopal Vijayaraghavan - Hortonworks
>> * Gunther Hagleitner - Hortonworks
>> * Hitesh Shah - Hortonworks
>> * Jason Lowe - Yahoo
>> * Jean Xu - Facebook
>> * Jitendra Pandey - Hortonworks
>> * Julien Le Dem - Twitter
>> * Kevin Wilfong - Facebook
>> * Mike Liddell - Microsoft
>> * Namit Jain - Facebook
>> * Nathan Roberts - Yahoo
>> * Owen O'Malley - Hortonworks
>> * Robert Evans - Yahoo
>> * Siddharth Seth - Hortonworks
>> * Tom White - Cloudera
>> * Thomas Graves - Yahoo
>> * Vikram Dixit - Hortonworks
>> * Vinod Kumar Vavilapalli - Hortonworks
>> * William Graham - Twitter
>> 
>> The nominated mentors are employees of Hortonworks, LinkedIn,
>> NASA JPL and Microsoft.
>> 
>> * Alan Gates - Hortonworks
>> * Arun C Murthy - Hortonworks
>> * Chris Douglas - Microsoft
>> * Chris Mattman - NASA JPL
>> * Jakob Homan - LinkedIn
>> * Owen O'Malley - Hortonworks
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> Arun C Murthy <acmurthy at apache dot org>
>> 
>> === Nominated Mentors ===
>> * Alan Gates <gates at apache dot org> – Architect at Hortonworks. Committer
for Pig.
>> * Arun C Murthy <acmurthy at apache dot org> – Architect at Hortonworks.
Committer for Hadoop.
>> * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at Microsoft.
Committer for Hadoop.
>> * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist, NASA
JPL. Committer for Nutch, OODT and Tika.
>> * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer, LinkedIn.
Committer for Hadoop, Kafka, Giraph.
>> * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks.
Committer for Hadoop, Ambari.
>> 
>> === Sponsoring Entity ===
>> Incubator
>> 
> 
> 
> 
> --
> Olivier Lamy
> Talend: http://coders.talend.com
> http://twitter.com/olamy | http://linkedin.com/in/olamy
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message