incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Karasulu <akaras...@apache.org>
Subject Re: [VOTE] Accept Howl as an Incubator Project
Date Wed, 23 Feb 2011 09:55:48 GMT
+1

However there might be a name conflict with the Howl transaction log over at
object web here:

   http://howl.ow2.org/

<http://howl.ow2.org/>Originally from the title, I immediately thought the
proposal was for OW's HOWL implementation to enter the incubator.

Best,
Alex

On Wed, Feb 23, 2011 at 10:16 AM, Tommaso Teofili <tommaso.teofili@gmail.com
> wrote:

> +1
> Tommaso
>
> 2011/2/23 Alan Gates <gates@yahoo-inc.com>
>
> > I would like to call a vote on accepting Howl as an Incubator project.
>  The
> > proposal is available at http://wiki.apache.org/incubator/HowlProposal.
> >  You can see the discussion from the proposal thread at
> > http://tinyurl.com/5w7y9p9.
> >
> > Alan.
> >
> > ----------------------
> >
> > Abstract
> > Howl is a table and storage management service for data created using
> > Apache Hadoop.
> >
> >
> > Proposal
> > The vision of Howl is to provide table management and storage management
> > layers for Apache Hadoop. This includes:
> >
> >        • Providing a shared schema and data type mechanism.
> >        • Providing a table abstraction so that users need not be
> concerned
> > with where or how their data is stored.
> >        • Providing interoperability across data processing tools such as
> > Pig, Map Reduce, Streaming, and Hive.
> >
> > Background
> > Data processors using Apache Hadoop have a common need for table
> management
> > services. The goal of a table management service is to track data that
> > exists in a Hadoop grid and present that data to users in a tabular
> format.
> > Such a table management service needs to provide a single input and
> output
> > format to users so that individual users need not be concerned with the
> > storage formats that are chosen for particular data sets. As part of
> having
> > a single format, the data will need to be described by one type of schema
> > and have a single datatype system.
> >
> > Additionally, users should be free to choose the best tools for their use
> > cases. The Hadoop project includes Map Reduce, Streaming, Pig, and Hive,
> and
> > additional tools exist such as Cascading. Each of these tools has users
> who
> > prefer it, and there are use cases best addressed by each of these tools.
> > Two users on the same grid who need to share data should not be
> constrained
> > to use the same tool but rather should be free to choose the best tool
> for
> > their use case. A table management service that presents data in the same
> > way to all of the tools can alleviate this problem by providing
> interfaces
> > to each of the data processing tools.
> >
> > There are also a few other features a table management service should
> > provide, such as notification of when data arrives.
> >
> > A couple of developers at Yahoo! started the project. It is based on the
> > Hive MetaStore component. There is good amount of interest in such a
> service
> > expressed from Yahoo!, Facebook, LinkedIn, and, others. We are therefore
> > proposing to place Howl in the Apache incubator and to build an open
> source
> > community around it.
> >
> >
> > Rationale
> > There is a strong need for a table management service, especially for
> large
> > grids with petabytes of data, and where the data volume is increasing by
> the
> > day. Hadoop users need to find data to read and have a place to store
>  their
> > data. Currently users must understand the location of data to read, the
> > storage format, compression techniques used, etc. To write data they need
> to
> > understand where on HDFS their data belongs, the best compression format
> to
> > use, how their data should be serialized, etc.
> >
> > Most users do not want to be concerned with these issues. They want these
> > managed for them.
> >
> > Having it as an Apache Open Source project will highly benefit Howl from
> > the point of view of getting a large community that currently uses Hadoop
> > and the other products built around Hadoop (like Pig, Hive, etc.). Users
> of
> > the Hadoop ecosystem can influence Howl’s roadmap, and contribute to it.
> > Looking at it in another way, we believe having Howl as part of the
> Hadoop
> > ecosystem will be a great benefit to the current Hadoop/Pig/Hive
> community
> > too.
> >
> >
> > Current Status
> >
> > Meritocracy
> > Our intent with this incubator proposal is to start building a diverse
> > developer community around Howl following the Apache meritocracy model.
> We
> > have wanted to make the project open source and encourage contributors
> from
> > multiple organizations from the start. We plan to provide plenty of
> support
> > to new developers and to quickly recruit those who make solid
> contributions
> > to committer status.
> >
> >
> > Community
> > Howl is currently being used by developers at Yahoo! and there has been
> an
> > expressed interest from LinkedIn and Facebook. Yahoo! also plans to
> deploy
> > the current version of Howl in production soon. We hope to extend the
> user
> > and developer base further in the future. The current developers and
> users
> > are all interested in building a solid open source community around Howl.
> >
> > To work towards an open source community, we have started using the
> GitHub
> > issue tracker and mailing lists at Yahoo! for development discussions
> within
> > our group.
> >
> >
> > Core Developers
> > Howl is currently being developed by four engineers from Yahoo! - Devaraj
> > Das, Ashutosh Chauhan, Sushanth Sowmyan, and Mac Yang. All the engineers
> > have deep expertise in Hadoop and the Hadoop Ecosystem in general.
> >
> >
> > Alignment
> > The ASF is a natural host for Howl given that it is already the home of
> > Hadoop, Pig, HBase, Cassandra, and other emerging cloud software
> projects.
> > Howl was designed to support Hadoop from the beginning in order to solve
> > data management challenges in Hadoop clusters. Howl complements the
> existing
> > Apache cloud computing projects by providing a unified way to manage
> data.
> >
> >
> > Known Risks
> >
> > Orphaned Products
> > The core developers plan to work full time on the project. There is very
> > little risk of Howl getting orphaned since large companies like Yahoo!
> are
> > planning to deploy this in their production Hadoop clusters. We believe
> we
> > can build an active developer community around Howl (companies like
> Facebook
> > and LinkedIn have also expressed interest).
> >
> >
> > Inexperience with Open Source
> > All of the core developers are active users and followers of open source.
> > Devaraj Das is an Apache Hadoop committer and Apache Hadoop PMC member,
> and
> > has experience with the Apache infrastructure and development process.
> > Ashutosh Chauhan is an Apache Pig committer and Apache Pig PMC member.
> > Sushanth Sowmyan and Mac Yang made contributions to the Apache Hive and
> the
> > Apache Chukwa projects.
> >
> >
> > Homogeneous Developers
> > The current core developers are all from Yahoo! However, we hope to
> > establish a developer community that includes contributors from several
> > corporations, and we are starting to work towards this with Facebook and
> > LinkedIn.
> >
> >
> > Reliance on Salaried Developers
> > Currently, the developers are paid to do work on Howl. However, once the
> > project has a community built around it, we expect to get committers and
> > developers from outside the current core developers. Companies like
> Yahoo!
> > are invested in Howl being a solution to the data management problem in
> > Hadoop clusters, and that is not likely to change.
> >
> >
> > Relationships with Other Apache Products
> > Howl is going to be used by users of Hadoop, Pig, and Hive. See section
> > Initial Source below for more information about Howl's relationship to
> Hive.
> >
> >
> > An Excessive Fascination with the Apache Brand
> > While we respect the reputation of the Apache brand and have no doubts
> that
> > it will attract contributors and users, our interest is primarily to give
> > Howl a solid home as an open source project following an established
> > development model. We have also given reasons in the Rationale and
> Alignment
> > sections.
> >
> >
> > Documentation
> > Information about Howl can be found at http://wiki.apache.org/pig/Howl.
> > The following sources may be useful to start with:
> >
> >        •
> > The GitHub site: https://github.com/yahoo/howl
> >
> >        •
> > The roadmap: http://wiki.apache.org/pig/HowlJournal
> >
> >
> > Initial Source
> > Howl has been under development since Summer 2010 by a team of engineers
> in
> > Yahoo!. It is currently hosted on GitHub under an Apache license at
> > https://github.com/yahoo/howl.
> >
> > The initial development of Howl has consisted of:
> >
> >        • maintaining a branch of the entire Hive codebase
> >        • getting Howl-related patches committed to Hive
> >        • developing Howl-specific plugins and wrappers to customize Hive
> > behavior
> > At runtime, Howl executes Hive code for metastore and CLI+DDL, disabling
> > anything related to Hadoop map/reduce execution. It also makes use of the
> > RCFile storage format contained in Hive.
> >
> > This approach was taken as a first step in order to validate the required
> > functionality and get a production version working. However, in the
> > long-term, maintaining a clone of Hive is undesirable. One possible
> > resolution is to factor the metastore+CLI+DDL components out of Hive and
> > move them into Howl (making Hive dependent on Howl). Another possible
> > resolution is to remove the copy of Hive from Howl and do the
> build/release
> > engineering necessary to make Howl depend on Hive. As part of the
> incubation
> > process, we plan to work towards resolution of  these issues.
> >
> >
> > External Dependencies
> > The dependencies all have Apache compatible licenses.
> >
> >
> > Cryptography
> > Not applicable.
> >
> >
> > Required Resources
> >
> > Mailing Lists
> >        • howl-private for private PMC discussions (with moderated
> > subscriptions)
> >        • howl-dev
> >        • howl-commits
> >        • howl-user
> >
> > Subversion Directory
> > https://svn.apache.org/repos/asf/incubator/howl
> >
> >
> > Issue Tracking
> > JIRA Howl (HOWL)
> >
> >
> > Other Resources
> > The existing code already has unit tests, so we would like a Hudson
> > instance to run them whenever a new patch is submitted. This can be added
> > after project creation.
> >
> >
> > Initial Committers
> >        • Devaraj Das
> >        • Ashutosh Chauhan
> >        • Sushanth Sowmyan
> >        • Mac Yang
> >        • Paul Yang
> >        • Alan Gates
> > A CLA is already on file for Sushanth.
> >
> >
> > Affiliations
> >        • Devaraj Das (Yahoo!)
> >        • Ashutosh Chauhan (Yahoo!)
> >        • Sushanth Sowmyan (Yahoo!)
> >        • Mac Yang (Yahoo!)
> >        • Paul Yang (Facebook)
> >        • Alan Gates (Yahoo!)
> >
> > Sponsors
> >
> > Champion
> > Owen O’Malley
> >
> >
> > Nominated Mentors
> >        • Olga Natkovich (Pig PMC member and Apache VP for Pig)
> >        • Alan Gates (Pig PMC member)
> >        • John Sichi (Hive PMC member)
> >
> > Sponsoring Entity
> > We are requesting the Incubator to sponsor this project.
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message