incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Douglas <cdoug...@apache.org>
Subject Re: [VOTE] Accept Tez into Incubator
Date Thu, 21 Feb 2013 21:00:05 GMT
+1 (binding) -C

On Tue, Feb 19, 2013 at 8:26 PM, Arun C Murthy <acm@hortonworks.com> wrote:
> Hi Folks,
>
> Thanks for participating in the discussion. I'd like to call a VOTE for acceptance of
Apache Tez into the Incubator. I'll let the vote run till into this weekend (Sun 2/24 6pm
PST).
>
> [ ]  +1 Accept Apache Tez into the Incubator
> [ ]  +0 Don't care.
> [ ]  -1 Don't accept Apache Tez into the Incubator because...
>
> Full proposal is pasted at the bottom of this email, and the corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
>
> Only VOTEs from Incubator PMC members are binding, but all are welcome to express their
thoughts.
>
> Here's my +1 (binding).
>
> thanks,
> Arun
>
> PS: From the initial discussion, the only changes are that I've added one new mentor
and 2 new committers. All the new additions come from the non-major employer while we continue
to strive to further diversify during the incubation. Thanks.
>
> ----
>
> = Tez =
>
> == Abstract ==
> Tez is an effort to develop a generic application framework which can be used
> to process arbitrarily complex data-processing tasks and also a re-usable set
> of data-processing primitives which can be used by other projects.
>
> == Proposal ==
> Tez is a proposal to develop a generic application which can be used to
> process complex data-processing task DAGs and runs natively on Apache Hadoop
> YARN. YARN is a generic resource-management system on which currently
> applications like MapReduce already exist. MapReduce is a specific, and
> constrained, DAG - which is not optimal for several frameworks like Apache Hive
> and Apache Pig. Furthermore, we propose to develop a re-usable set of
> libraries of data-processing primitives such as sorting, merging,
> data-shuffling, intermediate data management etc. which are necessary for Tez
> which we envision can be used directly by other projects.
>
> == Background ==
> Apache Hadoop MapReduce has emerged as the assembly-language on which other
> frameworks like Apache Pig and Apache Hive have been built. However, it has
> been well accepted that MapReduce produces very constrained task DAGs for each
> job which results in Apache Pig and Apache Hive requiring multiple MapReduce
> jobs for several queries. By providing a more expressive DAG of tasks for a
> job, Tez attempts to provide significantly enhanced data-processing
> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
>
> == Rationale ==
> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
> allowing for more expressive task DAGs for data-processing applications such
> as Apache Pig, Apache Hive, Cascading etc.
>
> With emergence of Apache Hadoop YARN, there is a strong need for a
> common DAG application which can then be shared by Apache Pig, Apache Hive,
> Cascading etc.
>
> == Initial Goals ==
> The initial goals for this project are to specify the detailed requirements
> and architecture, and then develop the initial implementation including the
> DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
>
> == Current Status ==
> Significant work has been completed to identify the initial requirements and
> define the overall system architecture. There is a patch available in the
> internal Hortonworks git repository which can act as the initial seed.
>
> === Meritocracy ===
> We plan to invest in supporting a meritocracy. We will discuss the requirements
> in an open forum. Several companies have already expressed interest in this
> project, and we intend to invite additional developers to participate.
> We will encourage and monitor community participation so that privileges can be
> extended to those that contribute.
>
> === Community ===
> The need for a generic DAG application for data processing in the open source is
> tremendous, so there is a potential for a very large community. We believe
> that Tez's extensible architecture will further encourage community participation.
> Also, related Apache projects (eg, Pig, Hive) have very large and active
> communities, and we expect that over time Tez will also attract a large community.
>
> === Core Developers ===
> The developers on the initial committers list include people very experienced
> in the Apache Hadoop ecosystem:
>
>  * Alan Gates <gates at apache dot org>
>  * Arun C Murthy <acmurthy at apache dot org>
>  * Ashutosh Chauhan <hashutosh at apache dot org>
>  * Bikas Saha <bikas at apache dot org>
>  * Chris Douglas <cdouglas at apache dot org>
>  * Daryn Sharp <daryn at apache dot org>
>  * Devaraj Das <ddas at apache dot org>
>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>  * Hitesh Shah <hitesh at apache dot org>
>  * Jason Lowe <jlowe at apache dot org>
>  * Jean Xu <jeanxu at facebook dot com>
>  * Jitendra Pandey <jitendra at apache dot org>
>  * Julien Le Dem <julien at apache dot org>
>  * Kevin Wilfong <kevinwilfong at apache dot org>
>  * Mike Liddell <mike dot lidell at microsoft dot com>
>  * Namit Jain <namit at apache dot org>
>  * Nathan Roberts <nroberts at yahoo dash inc dot com>
>  * Owen O'Malley <omalley at apache dot org>
>  * Robert Evans <bobby at apache dot org>
>  * Siddharth Seth <sseth at apache dot org>
>  * Tom White <tomwhite at apache dot org>
>  * Thomas Graves <tgraves at apache dot org>
>  * Vikram Dixit <vikram at apache dot org>
>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>  * William Graham <billgraham at apache dot org>
>
> We realize that though we have significant employer diversity already,
> additional diversity is always better, and we will work
> aggressively to recruit developers from additional companies.
>
> === Alignment ===
> The initial committers strongly believe that a standard task DAG
> application on Apache Hadoop YARN will gain broader adoption as an open source,
> community driven project, where the community can contribute not only to the
> core components, but also to a growing collection of applications which will
> be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
> Cascading and other communities will find tremendous value in Tez and will adopt
> it en masse.
>
> == Known Risks ==
>
> === Orphaned Products ===
> The contributors are leading users and vendors in the Apache Hadoop ecosystem,
> with significant open source experience, so the risk of being orphaned is
> relatively low. The project could be at risk if vendors decided to change
> their strategies in the market. In such an event, the current committers
> plan to continue working on the project on their own time, though the
> progress will likely be slower. We plan to mitigate this risk by
> recruiting additional committers.
>
> === Inexperience with Open Source ===
> The initial committers include veteran Apache members (Committers, PMC members
> and Apache Members) and other developers who have varying degrees of experience
> with open source projects. All have been involved with source code that has
> been released under an open source license, and several also have experience
> developing code with an open source development process.
>
> === Homogenous Developers ===
> The initial committers are employed by a number of companies, including
> Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are committed
> to recruiting additional committers from other companies based on their
> contributions to the project even though we do have significant diversity
> already.
>
> === Reliance on Salaried Developers ===
> It is expected that Tez development will occur on both salaried time and on
> volunteer time, after hours. The majority of initial committers are paid by
> their employer to contribute to this project. However, they are all passionate
> about the project, and we are confident that the project will continue even if
> no salaried developers contribute to the project. We are committed to recruiting
> additional committers including non-salaried developers.
>
> === Relationships with Other Apache Products ===
> As mentioned in the Alignment section, Tez is closely integrated with Hadoop,
> Hive and Pig in a numerous ways. We look forward to collaborating with
> those communities, as well as other Apache communities.
>
> === An Excessive Fascination with the Apache Brand ===
> Tez solves a real need for generic task DAG management in the Apache Hadoop
> ecosystem, something which has been addressed in a very ad hoc manner so far
> by multiple Apache projects. Our rationale for developing Tez as an Apache
> project is detailed in the Rationale section. We believe that the Apache brand
> and community process will help us attract more contributors to this project,
> and help establish ubiquitous APIs.
>
> == Documentation ==
> http://wiki.apache.org/incubator/TezProposal
>
> == Initial Source ==
> Available as a patch.
>
> == Cryptography ==
> Tez will eventually support encryption on the wire. This is not one of the initial
> goals, and we do not expect Tez to be a controlled export item due to the use
> of encryption.
>
> == Required Resources ==
>
> === Mailing List ===
>  * tez-private
>  * tez-dev
>  * tez-user
>
> === Subversion Directory ===
> Git is the preferred source control system: git://git.apache.org/tez
>
> === Issue Tracking ===
>
> JIRA Tez (TEZ)
>
> == Initial Committers ==
>  * Alan Gates <gates at apache dot org>
>  * Arun C Murthy <acmurthy at apache dot org>
>  * Ashutosh Chauhan <hashutosh at apache dot org>
>  * Bikas Saha <bikas at apache dot org>
>  * Chris Douglas <cdouglas at apache dot org>
>  * Daryn Sharp <daryn at apache dot org>
>  * Devaraj Das <ddas at apache dot org>
>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>  * Hitesh Shah <hitesh at apache dot org>
>  * Jason Lowe <jlowe at apache dot org>
>  * Jean Xu <jeanxu at facebook dot com>
>  * Jitendra Pandey <jitendra at apache dot org>
>  * Julien Le Dem <julien at apache dot org>
>  * Kevin Wilfong <kevinwilfong at apache dot org>
>  * Mike Liddell <mike dot lidell at microsoft dot com>
>  * Namit Jain <namit at apache dot org>
>  * Nathan Roberts <nroberts at yahoo dash inc dot com>
>  * Owen O'Malley <omalley at apache dot org>
>  * Robert Evans <bobby at apache dot org>
>  * Siddharth Seth <sseth at apache dot org>
>  * Tom White <tomwhite at apache dot org>
>  * Thomas Graves <tgraves at apache dot org>
>  * Vikram Dixit <vikram at apache dot org>
>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>  * William Graham <billgraham at apache dot org>
>
> == Affiliations ==
> The initial committers are employees of Cloudera, Facebook, Hortonworks,
> Microsoft, Twitter and Yahoo Inc.
>
>  * Alan Gates - Hortonworks
>  * Arun C Murthy - Hortonworks
>  * Ashutosh Chauhan - Hortonworks
>  * Bikas Saha - Hortonworks
>  * Chris Douglas - Microsoft
>  * Daryn Sharp - Yahoo
>  * Devaraj Das - Hortonworks
>  * Gopal Vijayaraghavan - Hortonworks
>  * Gunther Hagleitner - Hortonworks
>  * Hitesh Shah - Hortonworks
>  * Jason Lowe - Yahoo
>  * Jean Xu - Facebook
>  * Jitendra Pandey - Hortonworks
>  * Julien Le Dem - Twitter
>  * Kevin Wilfong - Facebook
>  * Mike Liddell - Microsoft
>  * Namit Jain - Facebook
>  * Nathan Roberts - Yahoo
>  * Owen O'Malley - Hortonworks
>  * Robert Evans - Yahoo
>  * Siddharth Seth - Hortonworks
>  * Tom White - Cloudera
>  * Thomas Graves - Yahoo
>  * Vikram Dixit - Hortonworks
>  * Vinod Kumar Vavilapalli - Hortonworks
>  * William Graham - Twitter
>
> The nominated mentors are employees of Hortonworks, LinkedIn,
> NASA JPL and Microsoft.
>
>  * Alan Gates - Hortonworks
>  * Arun C Murthy - Hortonworks
>  * Chris Douglas - Microsoft
>  * Chris Mattman - NASA JPL
>  * Jakob Homan - LinkedIn
>  * Owen O'Malley - Hortonworks
>
> == Sponsors ==
>
> === Champion ===
> Arun C Murthy <acmurthy at apache dot org>
>
> === Nominated Mentors ===
>  * Alan Gates <gates at apache dot org> – Architect at Hortonworks. Committer
for Pig.
>  * Arun C Murthy <acmurthy at apache dot org> – Architect at Hortonworks.
Committer for Hadoop.
>  * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at Microsoft.
Committer for Hadoop.
>  * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist, NASA JPL.
Committer for Nutch, OODT and Tika.
>  * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer, LinkedIn.
Committer for Hadoop, Kafka, Giraph.
>  * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks.
Committer for Hadoop, Ambari.
>
> === Sponsoring Entity ===
> Incubator
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message