incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jakob Homan <jgho...@gmail.com>
Subject Re: [VOTE] Accept Tez into Incubator
Date Wed, 20 Feb 2013 16:33:41 GMT
+1 (binding) -jakob


On Wed, Feb 20, 2013 at 8:26 AM, Alejandro Abdelnur <tucu@cloudera.com>wrote:

> +1 (non-binding), glad to see that finally the idea of having a DAG AM is
> getting traction.
>
> Arun, would you please clarify how Tez is (conceptually) different from the
> Workflow AM proposed in MAPREDUCE-4495/OOZIE-1178?
>
>
>
> On Wed, Feb 20, 2013 at 6:50 AM, Hitesh Shah <hitesh@hortonworks.com>
> wrote:
>
> > +1 ( non-binding )
> >
> > -- Hitesh
> >
> > On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
> >
> > > Hi Folks,
> > >
> > > Thanks for participating in the discussion. I'd like to call a VOTE for
> > acceptance of Apache Tez into the Incubator. I'll let the vote run till
> > into this weekend (Sun 2/24 6pm PST).
> > >
> > > [ ]  +1 Accept Apache Tez into the Incubator
> > > [ ]  +0 Don't care.
> > > [ ]  -1 Don't accept Apache Tez into the Incubator because...
> > >
> > > Full proposal is pasted at the bottom of this email, and the
> > corresponding wiki is http://wiki.apache.org/incubator/TezProposal.
> > >
> > > Only VOTEs from Incubator PMC members are binding, but all are welcome
> > to express their thoughts.
> > >
> > > Here's my +1 (binding).
> > >
> > > thanks,
> > > Arun
> > >
> > > PS: From the initial discussion, the only changes are that I've added
> > one new mentor and 2 new committers. All the new additions come from the
> > non-major employer while we continue to strive to further diversify
> during
> > the incubation. Thanks.
> > >
> > > ----
> > >
> > > = Tez =
> > >
> > > == Abstract ==
> > > Tez is an effort to develop a generic application framework which can
> be
> > used
> > > to process arbitrarily complex data-processing tasks and also a
> > re-usable set
> > > of data-processing primitives which can be used by other projects.
> > >
> > > == Proposal ==
> > > Tez is a proposal to develop a generic application which can be used to
> > > process complex data-processing task DAGs and runs natively on Apache
> > Hadoop
> > > YARN. YARN is a generic resource-management system on which currently
> > > applications like MapReduce already exist. MapReduce is a specific, and
> > > constrained, DAG - which is not optimal for several frameworks like
> > Apache Hive
> > > and Apache Pig. Furthermore, we propose to develop a re-usable set of
> > > libraries of data-processing primitives such as sorting, merging,
> > > data-shuffling, intermediate data management etc. which are necessary
> > for Tez
> > > which we envision can be used directly by other projects.
> > >
> > > == Background ==
> > > Apache Hadoop MapReduce has emerged as the assembly-language on which
> > other
> > > frameworks like Apache Pig and Apache Hive have been built. However, it
> > has
> > > been well accepted that MapReduce produces very constrained task DAGs
> > for each
> > > job which results in Apache Pig and Apache Hive requiring multiple
> > MapReduce
> > > jobs for several queries. By providing a more expressive DAG of tasks
> > for a
> > > job, Tez attempts to provide significantly enhanced data-processing
> > > capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
> > >
> > > == Rationale ==
> > > There is an important gap that Tez fulfills in the Apache Hadoop
> > ecosystem of
> > > allowing for more expressive task DAGs for data-processing applications
> > such
> > > as Apache Pig, Apache Hive, Cascading etc.
> > >
> > > With emergence of Apache Hadoop YARN, there is a strong need for a
> > > common DAG application which can then be shared by Apache Pig, Apache
> > Hive,
> > > Cascading etc.
> > >
> > > == Initial Goals ==
> > > The initial goals for this project are to specify the detailed
> > requirements
> > > and architecture, and then develop the initial implementation including
> > the
> > > DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
> > >
> > > == Current Status ==
> > > Significant work has been completed to identify the initial
> requirements
> > and
> > > define the overall system architecture. There is a patch available in
> the
> > > internal Hortonworks git repository which can act as the initial seed.
> > >
> > > === Meritocracy ===
> > > We plan to invest in supporting a meritocracy. We will discuss the
> > requirements
> > > in an open forum. Several companies have already expressed interest in
> > this
> > > project, and we intend to invite additional developers to participate.
> > > We will encourage and monitor community participation so that
> privileges
> > can be
> > > extended to those that contribute.
> > >
> > > === Community ===
> > > The need for a generic DAG application for data processing in the open
> > source is
> > > tremendous, so there is a potential for a very large community. We
> > believe
> > > that Tez's extensible architecture will further encourage community
> > participation.
> > > Also, related Apache projects (eg, Pig, Hive) have very large and
> active
> > > communities, and we expect that over time Tez will also attract a large
> > community.
> > >
> > > === Core Developers ===
> > > The developers on the initial committers list include people very
> > experienced
> > > in the Apache Hadoop ecosystem:
> > >
> > > * Alan Gates <gates at apache dot org>
> > > * Arun C Murthy <acmurthy at apache dot org>
> > > * Ashutosh Chauhan <hashutosh at apache dot org>
> > > * Bikas Saha <bikas at apache dot org>
> > > * Chris Douglas <cdouglas at apache dot org>
> > > * Daryn Sharp <daryn at apache dot org>
> > > * Devaraj Das <ddas at apache dot org>
> > > * Gopal Vijayaraghavan <gopal at hortonworks dot com>
> > > * Gunther Hagleitner <ghagleitner at hortonworks dot com>
> > > * Hitesh Shah <hitesh at apache dot org>
> > > * Jason Lowe <jlowe at apache dot org>
> > > * Jean Xu <jeanxu at facebook dot com>
> > > * Jitendra Pandey <jitendra at apache dot org>
> > > * Julien Le Dem <julien at apache dot org>
> > > * Kevin Wilfong <kevinwilfong at apache dot org>
> > > * Mike Liddell <mike dot lidell at microsoft dot com>
> > > * Namit Jain <namit at apache dot org>
> > > * Nathan Roberts <nroberts at yahoo dash inc dot com>
> > > * Owen O'Malley <omalley at apache dot org>
> > > * Robert Evans <bobby at apache dot org>
> > > * Siddharth Seth <sseth at apache dot org>
> > > * Tom White <tomwhite at apache dot org>
> > > * Thomas Graves <tgraves at apache dot org>
> > > * Vikram Dixit <vikram at apache dot org>
> > > * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
> > > * William Graham <billgraham at apache dot org>
> > >
> > > We realize that though we have significant employer diversity already,
> > > additional diversity is always better, and we will work
> > > aggressively to recruit developers from additional companies.
> > >
> > > === Alignment ===
> > > The initial committers strongly believe that a standard task DAG
> > > application on Apache Hadoop YARN will gain broader adoption as an open
> > source,
> > > community driven project, where the community can contribute not only
> to
> > the
> > > core components, but also to a growing collection of applications which
> > will
> > > be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
> > > Cascading and other communities will find tremendous value in Tez and
> > will adopt
> > > it en masse.
> > >
> > > == Known Risks ==
> > >
> > > === Orphaned Products ===
> > > The contributors are leading users and vendors in the Apache Hadoop
> > ecosystem,
> > > with significant open source experience, so the risk of being orphaned
> is
> > > relatively low. The project could be at risk if vendors decided to
> change
> > > their strategies in the market. In such an event, the current
> committers
> > > plan to continue working on the project on their own time, though the
> > > progress will likely be slower. We plan to mitigate this risk by
> > > recruiting additional committers.
> > >
> > > === Inexperience with Open Source ===
> > > The initial committers include veteran Apache members (Committers, PMC
> > members
> > > and Apache Members) and other developers who have varying degrees of
> > experience
> > > with open source projects. All have been involved with source code that
> > has
> > > been released under an open source license, and several also have
> > experience
> > > developing code with an open source development process.
> > >
> > > === Homogenous Developers ===
> > > The initial committers are employed by a number of companies, including
> > > Cloudera, Facebook, Hortonworks, Microsoft, Twitter and Yahoo. We are
> > committed
> > > to recruiting additional committers from other companies based on their
> > > contributions to the project even though we do have significant
> diversity
> > > already.
> > >
> > > === Reliance on Salaried Developers ===
> > > It is expected that Tez development will occur on both salaried time
> and
> > on
> > > volunteer time, after hours. The majority of initial committers are
> paid
> > by
> > > their employer to contribute to this project. However, they are all
> > passionate
> > > about the project, and we are confident that the project will continue
> > even if
> > > no salaried developers contribute to the project. We are committed to
> > recruiting
> > > additional committers including non-salaried developers.
> > >
> > > === Relationships with Other Apache Products ===
> > > As mentioned in the Alignment section, Tez is closely integrated with
> > Hadoop,
> > > Hive and Pig in a numerous ways. We look forward to collaborating with
> > > those communities, as well as other Apache communities.
> > >
> > > === An Excessive Fascination with the Apache Brand ===
> > > Tez solves a real need for generic task DAG management in the Apache
> > Hadoop
> > > ecosystem, something which has been addressed in a very ad hoc manner
> so
> > far
> > > by multiple Apache projects. Our rationale for developing Tez as an
> > Apache
> > > project is detailed in the Rationale section. We believe that the
> Apache
> > brand
> > > and community process will help us attract more contributors to this
> > project,
> > > and help establish ubiquitous APIs.
> > >
> > > == Documentation ==
> > > http://wiki.apache.org/incubator/TezProposal
> > >
> > > == Initial Source ==
> > > Available as a patch.
> > >
> > > == Cryptography ==
> > > Tez will eventually support encryption on the wire. This is not one of
> > the initial
> > > goals, and we do not expect Tez to be a controlled export item due to
> > the use
> > > of encryption.
> > >
> > > == Required Resources ==
> > >
> > > === Mailing List ===
> > > * tez-private
> > > * tez-dev
> > > * tez-user
> > >
> > > === Subversion Directory ===
> > > Git is the preferred source control system: git://git.apache.org/tez
> > >
> > > === Issue Tracking ===
> > >
> > > JIRA Tez (TEZ)
> > >
> > > == Initial Committers ==
> > > * Alan Gates <gates at apache dot org>
> > > * Arun C Murthy <acmurthy at apache dot org>
> > > * Ashutosh Chauhan <hashutosh at apache dot org>
> > > * Bikas Saha <bikas at apache dot org>
> > > * Chris Douglas <cdouglas at apache dot org>
> > > * Daryn Sharp <daryn at apache dot org>
> > > * Devaraj Das <ddas at apache dot org>
> > > * Gopal Vijayaraghavan <gopal at hortonworks dot com>
> > > * Gunther Hagleitner <ghagleitner at hortonworks dot com>
> > > * Hitesh Shah <hitesh at apache dot org>
> > > * Jason Lowe <jlowe at apache dot org>
> > > * Jean Xu <jeanxu at facebook dot com>
> > > * Jitendra Pandey <jitendra at apache dot org>
> > > * Julien Le Dem <julien at apache dot org>
> > > * Kevin Wilfong <kevinwilfong at apache dot org>
> > > * Mike Liddell <mike dot lidell at microsoft dot com>
> > > * Namit Jain <namit at apache dot org>
> > > * Nathan Roberts <nroberts at yahoo dash inc dot com>
> > > * Owen O'Malley <omalley at apache dot org>
> > > * Robert Evans <bobby at apache dot org>
> > > * Siddharth Seth <sseth at apache dot org>
> > > * Tom White <tomwhite at apache dot org>
> > > * Thomas Graves <tgraves at apache dot org>
> > > * Vikram Dixit <vikram at apache dot org>
> > > * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
> > > * William Graham <billgraham at apache dot org>
> > >
> > > == Affiliations ==
> > > The initial committers are employees of Cloudera, Facebook,
> Hortonworks,
> > > Microsoft, Twitter and Yahoo Inc.
> > >
> > > * Alan Gates - Hortonworks
> > > * Arun C Murthy - Hortonworks
> > > * Ashutosh Chauhan - Hortonworks
> > > * Bikas Saha - Hortonworks
> > > * Chris Douglas - Microsoft
> > > * Daryn Sharp - Yahoo
> > > * Devaraj Das - Hortonworks
> > > * Gopal Vijayaraghavan - Hortonworks
> > > * Gunther Hagleitner - Hortonworks
> > > * Hitesh Shah - Hortonworks
> > > * Jason Lowe - Yahoo
> > > * Jean Xu - Facebook
> > > * Jitendra Pandey - Hortonworks
> > > * Julien Le Dem - Twitter
> > > * Kevin Wilfong - Facebook
> > > * Mike Liddell - Microsoft
> > > * Namit Jain - Facebook
> > > * Nathan Roberts - Yahoo
> > > * Owen O'Malley - Hortonworks
> > > * Robert Evans - Yahoo
> > > * Siddharth Seth - Hortonworks
> > > * Tom White - Cloudera
> > > * Thomas Graves - Yahoo
> > > * Vikram Dixit - Hortonworks
> > > * Vinod Kumar Vavilapalli - Hortonworks
> > > * William Graham - Twitter
> > >
> > > The nominated mentors are employees of Hortonworks, LinkedIn,
> > > NASA JPL and Microsoft.
> > >
> > > * Alan Gates - Hortonworks
> > > * Arun C Murthy - Hortonworks
> > > * Chris Douglas - Microsoft
> > > * Chris Mattman - NASA JPL
> > > * Jakob Homan - LinkedIn
> > > * Owen O'Malley - Hortonworks
> > >
> > > == Sponsors ==
> > >
> > > === Champion ===
> > > Arun C Murthy <acmurthy at apache dot org>
> > >
> > > === Nominated Mentors ===
> > > * Alan Gates <gates at apache dot org> – Architect at Hortonworks.
> > Committer for Pig.
> > > * Arun C Murthy <acmurthy at apache dot org> – Architect at
> > Hortonworks. Committer for Hadoop.
> > > * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer
at
> > Microsoft. Committer for Hadoop.
> > > * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist,
> > NASA JPL. Committer for Nutch, OODT and Tika.
> > > * Jakob Homan <jghoman at apache dot org> – Sr. Software Engineer,
> > LinkedIn. Committer for Hadoop, Kafka, Giraph.
> > > * Owen O'Malley <omalley at apache dot org> – Architect at
> > Hortonworks. Committer for Hadoop, Ambari.
> > >
> > > === Sponsoring Entity ===
> > > Incubator
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>
>
> --
> Alejandro
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message