incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hitesh Shah <hit...@hortonworks.com>
Subject Re: [VOTE] Accept Tez into Incubator
Date Wed, 20 Feb 2013 14:50:50 GMT
+1 ( non-binding ) 

-- Hitesh

On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:

> Hi Folks,
> 
> Thanks for participating in the discussion. I'd like to call a VOTE for acceptance of
Apache Tez into the Incubator. I'll let the vote run till into this weekend (Sun 2/24 6pm
PST).
> 
> [ ]  +1 Accept Apache Tez into the Incubator
> [ ]  +0 Don't care.
> [ ]  -1 Don't accept Apache Tez into the Incubator because...
> 
> Full proposal is pasted at the bottom of this email, and the corresponding wiki is http://wiki.apache.org/incubator/TezProposal.

> 
> Only VOTEs from Incubator PMC members are binding, but all are welcome to express their
thoughts.
> 
> Here's my +1 (binding).
> 
> thanks,
> Arun
> 
> PS: From the initial discussion, the only changes are that I've added one new mentor
and 2 new committers. All the new additions come from the non-major employer while we continue
to strive to further diversify during the incubation. Thanks.
> 
> ----
> 
> = Tez =
> 
> == Abstract ==
> Tez is an effort to develop a generic application framework which can be used
> to process arbitrarily complex data-processing tasks and also a re-usable set
> of data-processing primitives which can be used by other projects.
> 
> == Proposal ==
> Tez is a proposal to develop a generic application which can be used to
> process complex data-processing task DAGs and runs natively on Apache Hadoop 
> YARN. YARN is a generic resource-management system on which currently 
> applications like MapReduce already exist. MapReduce is a specific, and
> constrained, DAG - which is not optimal for several frameworks like Apache Hive
> and Apache Pig. Furthermore, we propose to develop a re-usable set of
> libraries of data-processing primitives such as sorting, merging,
> data-shuffling, intermediate data management etc. which are necessary for Tez 
> which we envision can be used directly by other projects. 
> 
> == Background ==
> Apache Hadoop MapReduce has emerged as the assembly-language on which other
> frameworks like Apache Pig and Apache Hive have been built. However, it has
> been well accepted that MapReduce produces very constrained task DAGs for each
> job which results in Apache Pig and Apache Hive requiring multiple MapReduce
> jobs for several queries. By providing a more expressive DAG of tasks for a
> job, Tez attempts to provide significantly enhanced data-processing
> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
> 
> == Rationale ==
> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
> allowing for more expressive task DAGs for data-processing applications such
> as Apache Pig, Apache Hive, Cascading etc.
> 
> With emergence of Apache Hadoop YARN, there is a strong need for a
> common DAG application which can then be shared by Apache Pig, Apache Hive,
> Cascading etc.
> 
> == Initial Goals ==
> The initial goals for this project are to specify the detailed requirements
> and architecture, and then develop the initial implementation including the
> DAG ApplicationMaster to run natively inside Apache Hadoop YARN. 
> 
> == Current Status ==
> Significant work has been completed to identify the initial requirements and
> define the overall system architecture. There is a patch available in the
> internal Hortonworks git repository which can act as the initial seed. 
> 
> === Meritocracy ===
> We plan to invest in supporting a meritocracy. We will discuss the requirements 
> in an open forum. Several companies have already expressed interest in this 
> project, and we intend to invite additional developers to participate. 
> We will encourage and monitor community participation so that privileges can be 
> extended to those that contribute. 
> 
> === Community ===
> The need for a generic DAG application for data processing in the open source is 
> tremendous, so there is a potential for a very large community. We believe
> that Tez's extensible architecture will further encourage community participation. 
> Also, related Apache projects (eg, Pig, Hive) have very large and active 
> communities, and we expect that over time Tez will also attract a large community.
> 
> === Core Developers ===
> The developers on the initial committers list include people very experienced
> in the Apache Hadoop ecosystem:
> 
> * Alan Gates <gates at apache dot org>
> * Arun C Murthy <acmurthy at apache dot org>
> * Ashutosh Chauhan <hashutosh at apache dot org>
> * Bikas Saha <bikas at apache dot org>
> * Chris Douglas <cdouglas at apache dot org>
> * Daryn Sharp <daryn at apache dot org>
> * Devaraj Das <ddas at apache dot org>
> * Gopal Vijayaraghavan <gopal at hortonworks dot com>
> * Gunther Hagleitner <ghagleitner at hortonworks dot com>
> * Hitesh Shah <hitesh at apache dot org>
> * Jason Lowe <jlowe at apache dot org>
> * Jean Xu <jeanxu at facebook dot com>
> * Jitendra Pandey <jitendra at apache dot org>
> * Julien Le Dem <julien at apache dot org>
> * Kevin Wilfong <kevinwilfong at apache dot org>
> * Mike Liddell <mike dot lidell at microsoft dot com>
> * Namit Jain <namit at apache dot org>
> * Nathan Roberts <nroberts at yahoo dash inc dot com>
> * Owen O'Malley <omalley at apache dot org>
> * Robert Evans <bobby at apache dot org>
> * Siddharth Seth <sseth at apache dot org>
> * Tom White <tomwhite at apache dot org>
> * Thomas Graves <tgraves at apache dot org>
> * Vikram Dixit <vikram at apache dot org>
> * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
> * William Graham <billgraham at apache dot org>
> 
> We realize that though we have significant employer diversity already, 
> additional diversity is always better, and we will work 
> aggressively to recruit developers from additional companies.
> 
> === Alignment ===
> The initial committers strongly believe that a standard task DAG 
> application on Apache Hadoop YARN will gain broader adoption as an open source, 
> community driven project, where the community can contribute not only to the 
> core components, but also to a growing collection of applications which will
> be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
> Cascading and other communities will find tremendous value in Tez and will adopt 
> it en masse. 
> 
> == Known Risks ==
> 
> === Orphaned Products ===
> The contributors are leading users and vendors in the Apache Hadoop ecosystem, 
> with significant open source experience, so the risk of being orphaned is 
> relatively low. The project could be at risk if vendors decided to change 
> their strategies in the market. In such an event, the current committers 
> plan to continue working on the project on their own time, though the 
> progress will likely be slower. We plan to mitigate this risk by 
> recruiting additional committers.
> 
> === Inexperience with Open Source ===
> The initial committers include veteran Apache members (Committers, PMC members
> and Apache Members) and other developers who have varying degrees of experience 
> with open source projects. All have been involved with source code that has 
> been released under an open source license, and several also have experience 
> developing code with an open source development process.
> 
> === Homogenous Developers ===
> The initial committers are employed by a number of companies, including
> Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are committed 
> to recruiting additional committers from other companies based on their 
> contributions to the project even though we do have significant diversity
> already. 
> 
> === Reliance on Salaried Developers ===
> It is expected that Tez development will occur on both salaried time and on 
> volunteer time, after hours. The majority of initial committers are paid by 
> their employer to contribute to this project. However, they are all passionate 
> about the project, and we are confident that the project will continue even if 
> no salaried developers contribute to the project. We are committed to recruiting 
> additional committers including non-salaried developers.
> 
> === Relationships with Other Apache Products ===
> As mentioned in the Alignment section, Tez is closely integrated with Hadoop,
> Hive and Pig in a numerous ways. We look forward to collaborating with 
> those communities, as well as other Apache communities. 
> 
> === An Excessive Fascination with the Apache Brand ===
> Tez solves a real need for generic task DAG management in the Apache Hadoop
> ecosystem, something which has been addressed in a very ad hoc manner so far
> by multiple Apache projects. Our rationale for developing Tez as an Apache 
> project is detailed in the Rationale section. We believe that the Apache brand 
> and community process will help us attract more contributors to this project, 
> and help establish ubiquitous APIs. 
> 
> == Documentation ==
> http://wiki.apache.org/incubator/TezProposal
> 
> == Initial Source ==
> Available as a patch.
> 
> == Cryptography ==
> Tez will eventually support encryption on the wire. This is not one of the initial 
> goals, and we do not expect Tez to be a controlled export item due to the use 
> of encryption.
> 
> == Required Resources ==
> 
> === Mailing List ===
> * tez-private
> * tez-dev
> * tez-user
> 
> === Subversion Directory ===
> Git is the preferred source control system: git://git.apache.org/tez
> 
> === Issue Tracking ===
> 
> JIRA Tez (TEZ) 
> 
> == Initial Committers ==
> * Alan Gates <gates at apache dot org>
> * Arun C Murthy <acmurthy at apache dot org>
> * Ashutosh Chauhan <hashutosh at apache dot org>
> * Bikas Saha <bikas at apache dot org>
> * Chris Douglas <cdouglas at apache dot org>
> * Daryn Sharp <daryn at apache dot org>
> * Devaraj Das <ddas at apache dot org>
> * Gopal Vijayaraghavan <gopal at hortonworks dot com>
> * Gunther Hagleitner <ghagleitner at hortonworks dot com>
> * Hitesh Shah <hitesh at apache dot org>
> * Jason Lowe <jlowe at apache dot org>
> * Jean Xu <jeanxu at facebook dot com>
> * Jitendra Pandey <jitendra at apache dot org>
> * Julien Le Dem <julien at apache dot org>
> * Kevin Wilfong <kevinwilfong at apache dot org>
> * Mike Liddell <mike dot lidell at microsoft dot com>
> * Namit Jain <namit at apache dot org>
> * Nathan Roberts <nroberts at yahoo dash inc dot com>
> * Owen O'Malley <omalley at apache dot org>
> * Robert Evans <bobby at apache dot org>
> * Siddharth Seth <sseth at apache dot org>
> * Tom White <tomwhite at apache dot org>
> * Thomas Graves <tgraves at apache dot org>
> * Vikram Dixit <vikram at apache dot org>
> * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
> * William Graham <billgraham at apache dot org>
> 
> == Affiliations ==
> The initial committers are employees of Cloudera, Facebook, Hortonworks,
> Microsoft, Twitter and Yahoo Inc. 
> 
> * Alan Gates - Hortonworks 
> * Arun C Murthy - Hortonworks 
> * Ashutosh Chauhan - Hortonworks 
> * Bikas Saha - Hortonworks 
> * Chris Douglas - Microsoft 
> * Daryn Sharp - Yahoo 
> * Devaraj Das - Hortonworks 
> * Gopal Vijayaraghavan - Hortonworks 
> * Gunther Hagleitner - Hortonworks 
> * Hitesh Shah - Hortonworks 
> * Jason Lowe - Yahoo 
> * Jean Xu - Facebook 
> * Jitendra Pandey - Hortonworks 
> * Julien Le Dem - Twitter
> * Kevin Wilfong - Facebook 
> * Mike Liddell - Microsoft 
> * Namit Jain - Facebook 
> * Nathan Roberts - Yahoo 
> * Owen O'Malley - Hortonworks
> * Robert Evans - Yahoo 
> * Siddharth Seth - Hortonworks 
> * Tom White - Cloudera 
> * Thomas Graves - Yahoo 
> * Vikram Dixit - Hortonworks 
> * Vinod Kumar Vavilapalli - Hortonworks 
> * William Graham - Twitter 
> 
> The nominated mentors are employees of Hortonworks, LinkedIn, 
> NASA JPL and Microsoft.
> 
> * Alan Gates - Hortonworks 
> * Arun C Murthy - Hortonworks 
> * Chris Douglas - Microsoft 
> * Chris Mattman - NASA JPL 
> * Jakob Homan - LinkedIn 
> * Owen O'Malley - Hortonworks 
> 
> == Sponsors ==
> 
> === Champion ===
> Arun C Murthy <acmurthy at apache dot org>
> 
> === Nominated Mentors ===
> * Alan Gates <gates at apache dot org> – Architect at Hortonworks. Committer
for Pig. 
> * Arun C Murthy <acmurthy at apache dot org> – Architect at Hortonworks.
Committer for Hadoop. 
> * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at Microsoft.
Committer for Hadoop. 
> * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist, NASA JPL.
Committer for Nutch, OODT and Tika.  
> * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer, LinkedIn.
Committer for Hadoop, Kafka, Giraph.
> * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks. Committer
for Hadoop, Ambari. 
> 
> === Sponsoring Entity ===
> Incubator
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message