incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ross Gardler <rgard...@opendirective.com>
Subject RE: [PROPOSAL] Tez to join Apache Incubator
Date Tue, 19 Feb 2013 09:51:44 GMT
This is the incubator. It is designed to build the potential for diversity
on *exit*. Even so, the *entry* committer list has committers from 5
distinct organisations and therefore demonstrates diversity already.

Any one of the 10 non-Hortonworks committers can leverage their individual
authority to prevent block voting by other committers. That’s the Apache
Way.

If the podling doesn’t respect this then it won’t graduate. I suspect
everyone on that committer list knows this very well.

Do you have any reason to believe the project will not be able to graduate
with this list of committers?

Ross

Sent from Windows Mail

 *From:* Ted Dunning <ted.dunning@gmail.com>
*Sent:* 19 February 2013 05:41
*To:* general@incubator.apache.org
*Subject:* Re: [PROPOSAL] Tez to join Apache Incubator

This seems like a reasonable project (basically it is the long fabled
map-reduce-reduce or MCR* in google terminology).

But it is *very* heavy with Hortonworks developers.  By my count, the
proportion is over half from HW with only token representation from other
companies:

  13 Hortonworks
   4 Yahoo
   3 Facebook
   2 Microsoft
   1 Cloudera

Shouldn't this be a bit broader to start with?  Or is that an incubation
task?

On Mon, Feb 18, 2013 at 9:29 PM, Arun C Murthy <acm@hortonworks.com> wrote:

> Folks,
>
>  I'd like to propose adding Tez to the Apache Incubator:
> http://wiki.apache.org/incubator/TezProposal
>
>  Essentially, it's the next step to improve projects in the Apache Hadoop
> ecosystem such as Apache Hive, Apache Pig, Cascading (ASL2, but not ASF
> project) by providing a more complex DAG of 'tasks' in a single
application
> to process data, there-by providing significant advantages for them.
>
>  During the time I've spent working on MapReduce, I've forever heard
> complaints from Pig/Hive folks about the fact that MapReduce provides a
> very constrained task graph which results in excessive number of MapReduce
> jobs... *smile*. It's very exciting to take this next step, and I would be
> thrilled to have it happen in the ASF - as you can see in the proposal
this
> effort has broad support from members of MapReduce, Hive & Pig
communities,
> many of whom are eager to participate and have already contributed their
> efforts during the initial prototype.
>
>  I welcome your feedback/discussion and look forward to it!
>
> thanks,
> Arun
> (proposed Champion)
>
> ----
>
> = Tez =
>
> == Abstract ==
> Tez is an effort to develop a generic application framework which can be
> used
> to process arbitrarily complex data-processing tasks and also a re-usable
> set
> of data-processing primitives which can be used by other projects.
>
> == Proposal ==
> Tez is a proposal to develop a generic application which can be used to
> process complex data-processing task DAGs and runs natively on Apache
> Hadoop
> YARN. YARN is a generic resource-management system on which currently
> applications like MapReduce already exist. MapReduce is a specific, and
> constrained, DAG - which is not optimal for several frameworks like Apache
> Hive
> and Apache Pig. Furthermore, we propose to develop a re-usable set of
> libraries of data-processing primitives such as sorting, merging,
> data-shuffling, intermediate data management etc. which are necessary for
> Tez
> which we envision can be used directly by other projects.
>
> == Background ==
> Apache Hadoop MapReduce has emerged as the assembly-language on which
other
> frameworks like Apache Pig and Apache Hive have been built. However, it
has
> been well accepted that MapReduce produces very constrained task DAGs for
> each
> job which results in Apache Pig and Apache Hive requiring multiple
> MapReduce
> jobs for several queries. By providing a more expressive DAG of tasks for
a
> job, Tez attempts to provide significantly enhanced data-processing
> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
>
> == Rationale ==
> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem
> of
> allowing for more expressive task DAGs for data-processing applications
> such
> as Apache Pig, Apache Hive, Cascading etc.
>
> With emergence of Apache Hadoop YARN, there is a strong need for a
> common DAG application which can then be shared by Apache Pig, Apache
Hive,
> Cascading etc.
>
> == Initial Goals ==
> The initial goals for this project are to specify the detailed
requirements
> and architecture, and then develop the initial implementation including
the
> DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
>
> == Current Status ==
> Significant work has been completed to identify the initial requirements
> and
> define the overall system architecture. There is a patch available in the
> internal Hortonworks git repository which can act as the initial seed.
>
> === Meritocracy ===
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements
> in an open forum. Several companies have already expressed interest in
this
> project, and we intend to invite additional developers to participate.
> We will encourage and monitor community participation so that privileges
> can be
> extended to those that contribute.
>
> === Community ===
> The need for a generic DAG application for data processing in the open
> source is
> tremendous, so there is a potential for a very large community. We believe
> that Tez's extensible architecture will further encourage community
> participation.
> Also, related Apache projects (eg, Pig, Hive) have very large and active
> communities, and we expect that over time Tez will also attract a large
> community.
>
> === Core Developers ===
> The developers on the initial committers list include people very
> experienced
> in the Apache Hadoop ecosystem:
>
>  * Alan Gates <gates at apache dot org>
>  * Arun C Murthy <acmurthy at apache dot org>
>  * Ashutosh Chauhan <hashutosh at apache dot org>
>  * Bikas Saha <bikas at apache dot org>
>  * Chris Douglas <cdouglas at apache dot org>
>  * Daryn Sharp <daryn at apache dot org>
>  * Devaraj Das <ddas at apache dot org>
>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>  * Hitesh Shah <hitesh at apache dot org>
>  * Jason Lowe <jlowe at apache dot org>
>  * Jean Xu <jeanxu at facebook dot com>
>  * Jitendra Pandey <jitendra at apache dot org>
>  * Kevin Wilfong <kevinwilfong at apache dot org>
>  * Mike Liddell <mike dot lidell at microsoft dot com>
>  * Namit Jain <namit at apache dot org>
>  * Owen O'Malley <omalley at apache dot org>
>  * Robert Evans <bobby at apache dot org>
>  * Siddharth Seth <sseth at apache dot org>
>  * Tom White <tomwhite at apache dot org>
>  * Thomas Graves <tgraves at apache dot org>
>  * Vikram Dixit <vikram at apache dot org>
>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>
> We realize that though we have significant employer diversity already,
> additional diversity is always better, and we will work
> aggressively to recruit developers from additional companies.
>
> === Alignment ===
> The initial committers strongly believe that a standard task DAG
> application on Apache Hadoop YARN will gain broader adoption as an open
> source,
> community driven project, where the community can contribute not only to
> the
> core components, but also to a growing collection of applications which
> will
> be based on top of Tez. Our hope is that the Apache Hive, Apache Pig,
> Cascading and other communities will find tremendous value in Tez and will
> adopt
> it en masse.
>
> == Known Risks ==
>
> === Orphaned Products ===
> The contributors are leading users and vendors in the Apache Hadoop
> ecosystem,
> with significant open source experience, so the risk of being orphaned is
> relatively low. The project could be at risk if vendors decided to change
> their strategies in the market. In such an event, the current committers
> plan to continue working on the project on their own time, though the
> progress will likely be slower. We plan to mitigate this risk by
> recruiting additional committers.
>
> === Inexperience with Open Source ===
> The initial committers include veteran Apache members (Committers, PMC
> members
> and Apache Members) and other developers who have varying degrees of
> experience
> with open source projects. All have been involved with source code that
has
> been released under an open source license, and several also have
> experience
> developing code with an open source development process.
>
> === Homogenous Developers ===
> The initial committers are employed by a number of companies, including
> Cloudera, Facebook, Hortonworks, Microsoft and Yahoo. We are committed to
> recruiting additional committers from other companies based on their
> contributions to the project even though we do have significant diversity
> already.
>
> === Reliance on Salaried Developers ===
> It is expected that Tez development will occur on both salaried time and
on
> volunteer time, after hours. The majority of initial committers are paid
by
> their employer to contribute to this project. However, they are all
> passionate
> about the project, and we are confident that the project will continue
> even if
> no salaried developers contribute to the project. We are committed to
> recruiting
> additional committers including non-salaried developers.
>
> === Relationships with Other Apache Products ===
> As mentioned in the Alignment section, Tez is closely integrated with
> Hadoop,
> Hive and Pig in a numerous ways. We look forward to collaborating with
> those communities, as well as other Apache communities.
>
> === An Excessive Fascination with the Apache Brand ===
> Tez solves a real need for generic task DAG management in the Apache
Hadoop
> ecosystem, something which has been addressed in a very ad hoc manner so
> far
> by multiple Apache projects. Our rationale for developing Tez as an Apache
> project is detailed in the Rationale section. We believe that the Apache
> brand
> and community process will help us attract more contributors to this
> project,
> and help establish ubiquitous APIs.
>
> == Documentation ==
> http://wiki.apache.org/incubator/TezProposal
>
> == Initial Source ==
> Available as a patch.
>
> == Cryptography ==
> Tez will eventually support encryption on the wire. This is not one of the
> initial
> goals, and we do not expect Tez to be a controlled export item due to the
> use
> of encryption.
>
> == Required Resources ==
>
> === Mailing List ===
>  * tez-private
>  * tez-dev
>  * tez-user
>
> === Subversion Directory ===
> Git is the preferred source control system: git://git.apache.org/tez
>
> === Issue Tracking ===
>
> JIRA Tez (TEZ)
>
> == Initial Committers ==
>  * Alan Gates <gates at apache dot org>
>  * Arun C Murthy <acmurthy at apache dot org>
>  * Ashutosh Chauhan <hashutosh at apache dot org>
>  * Bikas Saha <bikas at apache dot org>
>  * Chris Douglas <cdouglas at apache dot org>
>  * Daryn Sharp <daryn at apache dot org>
>  * Devaraj Das <ddas at apache dot org>
>  * Gopal Vijayaraghavan <gopal at hortonworks dot com>
>  * Gunther Hagleitner <ghagleitner at hortonworks dot com>
>  * Hitesh Shah <hitesh at apache dot org>
>  * Jason Lowe <jlowe at apache dot org>
>  * Jean Xu <jeanxu at facebook dot com>
>  * Jitendra Pandey <jitendra at apache dot org>
>  * Kevin Wilfong <kevinwilfong at apache dot org>
>  * Mike Liddell <mike dot lidell at microsoft dot com>
>  * Namit Jain <namit at apache dot org>
>  * Owen O'Malley <omalley at apache dot org>
>  * Robert Evans <bobby at apache dot org>
>  * Siddharth Seth <sseth at apache dot org>
>  * Tom White <tomwhite at apache dot org>
>  * Thomas Graves <tgraves at apache dot org>
>  * Vikram Dixit <vikram at apache dot org>
>  * Vinod Kumar Vavilapalli <vinodkv at apache dot org>
>
> == Affiliations ==
> The initial committers are employees of Cloudera, Facebook, Hortonworks,
> Microsoft  and Yahoo Inc.
>
>  * Alan Gates - Hortonworks
>  * Arun C Murthy - Hortonworks
>  * Ashutosh Chauhan - Hortonworks
>  * Bikas Saha - Hortonworks
>  * Chris Douglas - Microsoft
>  * Daryn Sharp - Yahoo
>  * Devaraj Das - Hortonworks
>  * Gopal Vijayaraghavan - Hortonworks
>  * Gunther Hagleitner - Hortonworks
>  * Hitesh Shah - Hortonworks
>  * Jason Lowe - Yahoo
>  * Jean Xu - Facebook
>  * Jitendra Pandey - Hortonworks
>  * Kevin Wilfong - Facebook
>  * Mike Liddell - Microsoft
>  * Namit Jain - Facebook
>  * Owen O'Malley - Hortonworks
>  * Robert Evans - Yahoo
>  * Siddharth Seth - Hortonworks
>  * Tom White - Cloudera
>  * Thomas Graves - Yahoo
>  * Vikram Dixit - Hortonworks
>  * Vinod Kumar Vavilapalli - Hortonworks
>
> The nominated mentors are employees of Hortonworks,
> NASA JPL and Microsoft.
>
>  * Alan Gates - Hortonworks
>  * Arun C Murthy - Hortonworks
>  * Chris Douglas - Microsoft
>  * Chris Mattman - NASA JPL
>  * Owen O'Malley - Hortonworks
>
> == Sponsors ==
>
> === Champion ===
> Arun C Murthy <acmurthy at apache dot org>
>
> === Nominated Mentors ===
>  * Alan Gates <gates at apache dot org> – Architect at Hortonworks.
> Committer for Pig.
>  * Arun C Murthy <acmurthy at apache dot org> – Architect at
> Hortonworks. Committer for Hadoop.
>  * Chris Douglas <cdouglas at apache dot org> - Sr. Research Engineer at
> Microsoft. Committer for Hadoop.
>  * Chris Mattman <mattmann at apache dot org> - Sr. Computer Scientist,
> NASA JPL. Committer for Nutch, OODT and Tika.
>  * Owen O'Malley <omalley at apache dot org> – Architect at Hortonworks.
> Committer for Hadoop, Ambari.
>
> === Sponsoring Entity ===
> Incubator
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message