incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian McCallister <bri...@skife.org>
Subject Re: [PROPOSAL] Propose Howl as an Apache Incubator project
Date Sun, 13 Feb 2011 17:52:56 GMT
The proposal looks fine, but the name collides with http://howl.ow2.org/

-Brian

On Thu, Feb 10, 2011 at 1:37 PM, Alan Gates <gates@yahoo-inc.com> wrote:
> I would like to propose Howl as an Apache Incubator project.  Howl is a
> table and storage management service for data created using Apache Hadoop.
>  The proposal is on the Incubator wiki at
> http://wiki.apache.org/incubator/HowlProposal and is pasted below.  Thanks.
>
> Alan.
>
> == Abstract ==
> Howl is a table and storage management service for data created using Apache
> Hadoop.
>
> == Proposal ==
> The vision of Howl is to provide table management and storage management
> layers for Apache Hadoop.  This includes:
>  * Providing a shared schema and data type mechanism.
>  * Providing a table abstraction so that users need not be concerned with
> where or how their data is stored.
>  * Providing interoperability across data processing tools such as Pig, Map
> Reduce, Streaming, and Hive.
>
> == Background ==
> Data processors using Apache Hadoop have a common need for table management
> services.  The goal of a table management service is to track data that
> exists in a Hadoop grid and present that data to users in a tabular format.
>  Such a table management service needs to provide a single input and output
> format to users so that individual users need not be concerned with the
> storage formats that are chosen for particular data sets.  As part of having
> a single format, the data will need to be described by one type of schema
> and have a single datatype system.
>
> Additionally, users should be free to choose the best tools for their use
> cases.  The Hadoop project includes Map Reduce, Streaming, Pig, and Hive,
> and additional tools exist such as Cascading.  Each of these tools has users
> who prefer it, and there are use cases best addressed by each of these
> tools.  Two users on the same grid who need to share data should not be
> constrained to use the same tool but rather should be free to choose the
> best tool for their use case.  A table management service that presents data
> in the same way to all of the tools can alleviate this problem by providing
> interfaces to each of the data processing tools.
>
> There are also a few other features a table management service should
> provide, such as notification of when data arrives.
>
> A couple of developers at Yahoo! started the project. It is based on the
> Hive !MetaStore component. There is good amount of interest in such a
> service expressed from Yahoo!, Facebook, !LinkedIn, and, others. We are
> therefore proposing to place Howl in the Apache incubator and to build an
> open source community around it.
>
>
> == Rationale ==
> There is a strong need for a table management service, especially for large
> grids with petabytes of data, and where the data volume is increasing by the
> day. Hadoop users need to find data to read and have a place to store their
> data.  Currently users must understand the location of data to read, the
> storage format, compression techniques used, etc.  To write data they need
> to understand where on HDFS their data belongs, the best compression format
> to use, how their data should be serialized, etc.
>
> Most users do not want to be concerned with these issues.  They want these
> managed for them.
>
> Having it as an Apache Open Source project will highly benefit Howl from the
> point of view of getting a large community that currently uses Hadoop and
> the other products built around Hadoop (like Pig, Hive, etc.). Users of the
> Hadoop ecosystem can influence Howl’s roadmap, and contribute to it. Looking
> at it in another way, we believe having Howl as part of the Hadoop ecosystem
> will be a great benefit to the current Hadoop/Pig/Hive community too.
>
> == Current Status ==
> === Meritocracy ===
> Our intent with this incubator proposal is to start building a diverse
> developer community around Howl following the Apache meritocracy model. We
> have wanted to make the project open source and encourage contributors from
> multiple organizations from the start. We plan to provide plenty of support
> to new developers and to quickly recruit those who make solid contributions
> to committer status.
>
> === Community ===
> Howl is currently being used by developers at Yahoo! and there has been an
> expressed interest from !LinkedIn and Facebook. Yahoo! also plans to deploy
> the current version of Howl in production soon. We hope to extend the user
> and developer base further in the future. The current developers and users
> are all interested in building a solid open source community around Howl.
>
> To work towards an open source community, we have started using the !GitHub
> issue tracker and mailing lists at Yahoo! for development discussions within
> our group.
>
> === Core Developers ===
> Howl is currently being developed by four engineers from Yahoo! - Devaraj
> Das, Ashutosh Chauhan, Sushanth Sowmyan, and Mac Yang. All the engineers
> have deep expertise in Hadoop and the Hadoop Ecosystem in general.
>
> === Alignment ===
> The ASF is a natural host for Howl given that it is already the home of
> Hadoop, Pig, HBase, Cassandra, and other emerging cloud software projects.
> Howl was designed to support Hadoop from the beginning in order to solve
> data management challenges in Hadoop clusters. Howl complements the existing
> Apache cloud computing projects by providing a unified way to manage data.
>
> == Known Risks ==
> === Orphaned Products ===
> The core developers plan to work full time on the project. There is very
> little risk of Howl getting orphaned since large companies like Yahoo! are
> planning to deploy this in their production Hadoop clusters. We believe we
> can build an active developer community around Howl (companies like Facebook
> and !LinkedIn have also expressed interest).
>
> === Inexperience with Open Source ===
> All of the core developers are active users and followers of open source.
> Devaraj Das is an Apache Hadoop committer and Apache Hadoop PMC member, and
> has experience with the Apache infrastructure and development process.
> Ashutosh Chauhan is an Apache Pig committer and Apache Pig PMC member.
>  Sushanth Sowmyan and Mac Yang made contributions to the Apache Hive and the
> Apache Chukwa projects.
>
> === Homogeneous Developers ===
> The current core developers are all from Yahoo! However, we hope to
> establish a developer community that includes contributors from several
> corporations, and we are starting to work towards this with Facebook and
> !LinkedIn.
>
> === Reliance on Salaried Developers ===
> Currently, the developers are paid to do work on Howl. However, once the
> project has a community built around it, we expect to get committers and
> developers from outside the current core developers. Companies like Yahoo!
> are invested in Howl being a solution to the data management problem in
> Hadoop clusters, and that is not likely to change.
>
> === Relationships with Other Apache Products ===
> Howl is going to be used by users of Hadoop, Pig, and Hive. See section
> Initial Source below for more information about Howl's relationship to Hive.
>
> === An Excessive Fascination with the Apache Brand ===
> While we respect the reputation of the Apache brand and have no doubts that
> it will attract contributors and users, our interest is primarily to give
> Howl a solid home as an open source project following an established
> development model. We have also given reasons in the Rationale and Alignment
> sections.
>
> == Documentation ==
> Information about Howl can be found at http://wiki.apache.org/pig/Howl. The
> following sources may be useful to start with:
>  * The !GitHub site: https://github.com/yahoo/howl
>  * The roadmap: http://wiki.apache.org/pig/HowlJournal
>
> == Initial Source ==
> Howl has been under development since Summer 2010 by a team of engineers in
> Yahoo!.  It is currently hosted on !GitHub under an Apache license at
> https://github.com/yahoo/howl.
>
> The initial development of Howl has consisted of:
>
>  * maintaining a branch of the entire Hive codebase
>  * getting Howl-related patches committed to Hive
>  * developing Howl-specific plugins and wrappers to customize Hive behavior
>
> At runtime, Howl executes Hive code for metastore and CLI+DDL, disabling
> anything related to Hadoop map/reduce execution.  It also makes use of the
> RCFile storage format contained in Hive.
>
> This approach was taken as a first step in order to validate the required
> functionality and get a production version working.  However, in the
> long-term, maintaining a clone of Hive is undesirable.  One possible
> resolution is to factor the metastore+CLI+DDL components out of Hive and
> move them into Howl (making Hive dependent on Howl).  Another possible
> resolution is to remove the copy of Hive from Howl and do the build/release
> engineering necessary to make Howl depend on Hive.  As part of the
> incubation process, we plan to work towards resolution of these issues.
>
> == External Dependencies ==
> The dependencies all have Apache compatible licenses.
>
> == Cryptography ==
> Not applicable.
>
> == Required Resources ==
> === Mailing Lists ===
>  * howl-private for private PMC discussions (with moderated subscriptions)
>  * howl-dev
>  * howl-commits
>  * howl-user
> === Subversion Directory ===
> https://svn.apache.org/repos/asf/incubator/howl
>
> === Issue Tracking ===
> JIRA Howl (HOWL)
>
> === Other Resources ===
> The existing code already has unit tests, so we would like a Hudson instance
> to run them whenever a new patch is submitted. This can be added after
> project creation.
>
> == Initial Committers ==
>  * Devaraj Das
>  * Ashutosh Chauhan
>  * Sushanth Sowmyan
>  * Mac Yang
>  * Paul Yang
>  * Alan Gates
> A CLA is already on file for Sushanth.
>
> == Affiliations ==
>  * Devaraj Das (Yahoo!)
>  * Ashutosh Chauhan (Yahoo!)
>  * Sushanth Sowmyan (Yahoo!)
>  * Mac Yang (Yahoo!)
>  * Paul Yang (Facebook)
>  * Alan Gates (Yahoo!)
>
> == Sponsors ==
> === Champion ===
> Owen O’Malley
>
> === Nominated Mentors ===
>  * Olga Natkovich (Pig PMC member and Apache VP for Pig)
>  * Alan Gates (Pig PMC member)
>  * John Sichi (Hive PMC member)
>
> === Sponsoring Entity ===
> We are requesting the Incubator to sponsor this project.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message