Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 68359 invoked from network); 23 Feb 2011 08:17:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Feb 2011 08:17:30 -0000 Received: (qmail 78644 invoked by uid 500); 23 Feb 2011 08:17:29 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 78087 invoked by uid 500); 23 Feb 2011 08:17:27 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 78079 invoked by uid 99); 23 Feb 2011 08:17:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Feb 2011 08:17:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tommaso.teofili@gmail.com designates 209.85.161.175 as permitted sender) Received: from [209.85.161.175] (HELO mail-gx0-f175.google.com) (209.85.161.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 Feb 2011 08:17:21 +0000 Received: by gxk1 with SMTP id 1so3106353gxk.6 for ; Wed, 23 Feb 2011 00:17:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=yQGEyDAnF2nebNI5lp8VEfa9I5bOd+PXDyJarYIcpuc=; b=DPyXS0r6P8VXNPdrNIypxAPqvTs0XqK03BkpglOtKA/vmfcAxoIYrTluHOkAlqA5c6 +oV8eMP9kskMTsutY15aCdHExy8afXSVKr+LrxtZCiltQwKHdEGfFEVlsTkn6uiKYnVH T35gNm9QLmZSLaokzLj/u7B7PvyStS5EBJtaM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=nU6d3rUWDb9QQWNzIVfRdy5g/p0QmDl4eCeIx3JZtmt82OE0yh7nKEKjdT0JXanAXg 3Y1ANm8KZ6Km4PR+nT5xK84qqbjJI92EBX9L1OrhVTE/HoUrs/qCFwJR3w8Qb6NNg+rM nWVVxyPKfSFHF94mgtFZykcrb0zJPZdswLEYU= Received: by 10.150.135.15 with SMTP id i15mr4670682ybd.309.1298449020707; Wed, 23 Feb 2011 00:17:00 -0800 (PST) MIME-Version: 1.0 Received: by 10.147.136.15 with HTTP; Wed, 23 Feb 2011 00:16:19 -0800 (PST) In-Reply-To: <3BEA62CE-B2D6-4474-B76E-FE176A9D2BFE@yahoo-inc.com> References: <3BEA62CE-B2D6-4474-B76E-FE176A9D2BFE@yahoo-inc.com> From: Tommaso Teofili Date: Wed, 23 Feb 2011 09:16:19 +0100 Message-ID: Subject: Re: [VOTE] Accept Howl as an Incubator Project To: general@incubator.apache.org Content-Type: multipart/alternative; boundary=000e0cd5c924a36433049ceeb87d --000e0cd5c924a36433049ceeb87d Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable +1 Tommaso 2011/2/23 Alan Gates > I would like to call a vote on accepting Howl as an Incubator project. T= he > proposal is available at http://wiki.apache.org/incubator/HowlProposal. > You can see the discussion from the proposal thread at > http://tinyurl.com/5w7y9p9. > > Alan. > > ---------------------- > > Abstract > Howl is a table and storage management service for data created using > Apache Hadoop. > > > Proposal > The vision of Howl is to provide table management and storage management > layers for Apache Hadoop. This includes: > > =95 Providing a shared schema and data type mechanism. > =95 Providing a table abstraction so that users need not be concer= ned > with where or how their data is stored. > =95 Providing interoperability across data processing tools such a= s > Pig, Map Reduce, Streaming, and Hive. > > Background > Data processors using Apache Hadoop have a common need for table manageme= nt > services. The goal of a table management service is to track data that > exists in a Hadoop grid and present that data to users in a tabular forma= t. > Such a table management service needs to provide a single input and outpu= t > format to users so that individual users need not be concerned with the > storage formats that are chosen for particular data sets. As part of havi= ng > a single format, the data will need to be described by one type of schema > and have a single datatype system. > > Additionally, users should be free to choose the best tools for their use > cases. The Hadoop project includes Map Reduce, Streaming, Pig, and Hive, = and > additional tools exist such as Cascading. Each of these tools has users w= ho > prefer it, and there are use cases best addressed by each of these tools. > Two users on the same grid who need to share data should not be constrain= ed > to use the same tool but rather should be free to choose the best tool fo= r > their use case. A table management service that presents data in the same > way to all of the tools can alleviate this problem by providing interface= s > to each of the data processing tools. > > There are also a few other features a table management service should > provide, such as notification of when data arrives. > > A couple of developers at Yahoo! started the project. It is based on the > Hive MetaStore component. There is good amount of interest in such a serv= ice > expressed from Yahoo!, Facebook, LinkedIn, and, others. We are therefore > proposing to place Howl in the Apache incubator and to build an open sour= ce > community around it. > > > Rationale > There is a strong need for a table management service, especially for lar= ge > grids with petabytes of data, and where the data volume is increasing by = the > day. Hadoop users need to find data to read and have a place to store th= eir > data. Currently users must understand the location of data to read, the > storage format, compression techniques used, etc. To write data they need= to > understand where on HDFS their data belongs, the best compression format = to > use, how their data should be serialized, etc. > > Most users do not want to be concerned with these issues. They want these > managed for them. > > Having it as an Apache Open Source project will highly benefit Howl from > the point of view of getting a large community that currently uses Hadoop > and the other products built around Hadoop (like Pig, Hive, etc.). Users = of > the Hadoop ecosystem can influence Howl=92s roadmap, and contribute to it= . > Looking at it in another way, we believe having Howl as part of the Hadoo= p > ecosystem will be a great benefit to the current Hadoop/Pig/Hive communit= y > too. > > > Current Status > > Meritocracy > Our intent with this incubator proposal is to start building a diverse > developer community around Howl following the Apache meritocracy model. W= e > have wanted to make the project open source and encourage contributors fr= om > multiple organizations from the start. We plan to provide plenty of suppo= rt > to new developers and to quickly recruit those who make solid contributio= ns > to committer status. > > > Community > Howl is currently being used by developers at Yahoo! and there has been a= n > expressed interest from LinkedIn and Facebook. Yahoo! also plans to deplo= y > the current version of Howl in production soon. We hope to extend the use= r > and developer base further in the future. The current developers and user= s > are all interested in building a solid open source community around Howl. > > To work towards an open source community, we have started using the GitHu= b > issue tracker and mailing lists at Yahoo! for development discussions wit= hin > our group. > > > Core Developers > Howl is currently being developed by four engineers from Yahoo! - Devaraj > Das, Ashutosh Chauhan, Sushanth Sowmyan, and Mac Yang. All the engineers > have deep expertise in Hadoop and the Hadoop Ecosystem in general. > > > Alignment > The ASF is a natural host for Howl given that it is already the home of > Hadoop, Pig, HBase, Cassandra, and other emerging cloud software projects= . > Howl was designed to support Hadoop from the beginning in order to solve > data management challenges in Hadoop clusters. Howl complements the exist= ing > Apache cloud computing projects by providing a unified way to manage data= . > > > Known Risks > > Orphaned Products > The core developers plan to work full time on the project. There is very > little risk of Howl getting orphaned since large companies like Yahoo! ar= e > planning to deploy this in their production Hadoop clusters. We believe w= e > can build an active developer community around Howl (companies like Faceb= ook > and LinkedIn have also expressed interest). > > > Inexperience with Open Source > All of the core developers are active users and followers of open source. > Devaraj Das is an Apache Hadoop committer and Apache Hadoop PMC member, a= nd > has experience with the Apache infrastructure and development process. > Ashutosh Chauhan is an Apache Pig committer and Apache Pig PMC member. > Sushanth Sowmyan and Mac Yang made contributions to the Apache Hive and t= he > Apache Chukwa projects. > > > Homogeneous Developers > The current core developers are all from Yahoo! However, we hope to > establish a developer community that includes contributors from several > corporations, and we are starting to work towards this with Facebook and > LinkedIn. > > > Reliance on Salaried Developers > Currently, the developers are paid to do work on Howl. However, once the > project has a community built around it, we expect to get committers and > developers from outside the current core developers. Companies like Yahoo= ! > are invested in Howl being a solution to the data management problem in > Hadoop clusters, and that is not likely to change. > > > Relationships with Other Apache Products > Howl is going to be used by users of Hadoop, Pig, and Hive. See section > Initial Source below for more information about Howl's relationship to Hi= ve. > > > An Excessive Fascination with the Apache Brand > While we respect the reputation of the Apache brand and have no doubts th= at > it will attract contributors and users, our interest is primarily to give > Howl a solid home as an open source project following an established > development model. We have also given reasons in the Rationale and Alignm= ent > sections. > > > Documentation > Information about Howl can be found at http://wiki.apache.org/pig/Howl. > The following sources may be useful to start with: > > =95 > The GitHub site: https://github.com/yahoo/howl > > =95 > The roadmap: http://wiki.apache.org/pig/HowlJournal > > > Initial Source > Howl has been under development since Summer 2010 by a team of engineers = in > Yahoo!. It is currently hosted on GitHub under an Apache license at > https://github.com/yahoo/howl. > > The initial development of Howl has consisted of: > > =95 maintaining a branch of the entire Hive codebase > =95 getting Howl-related patches committed to Hive > =95 developing Howl-specific plugins and wrappers to customize Hiv= e > behavior > At runtime, Howl executes Hive code for metastore and CLI+DDL, disabling > anything related to Hadoop map/reduce execution. It also makes use of the > RCFile storage format contained in Hive. > > This approach was taken as a first step in order to validate the required > functionality and get a production version working. However, in the > long-term, maintaining a clone of Hive is undesirable. One possible > resolution is to factor the metastore+CLI+DDL components out of Hive and > move them into Howl (making Hive dependent on Howl). Another possible > resolution is to remove the copy of Hive from Howl and do the build/relea= se > engineering necessary to make Howl depend on Hive. As part of the incubat= ion > process, we plan to work towards resolution of these issues. > > > External Dependencies > The dependencies all have Apache compatible licenses. > > > Cryptography > Not applicable. > > > Required Resources > > Mailing Lists > =95 howl-private for private PMC discussions (with moderated > subscriptions) > =95 howl-dev > =95 howl-commits > =95 howl-user > > Subversion Directory > https://svn.apache.org/repos/asf/incubator/howl > > > Issue Tracking > JIRA Howl (HOWL) > > > Other Resources > The existing code already has unit tests, so we would like a Hudson > instance to run them whenever a new patch is submitted. This can be added > after project creation. > > > Initial Committers > =95 Devaraj Das > =95 Ashutosh Chauhan > =95 Sushanth Sowmyan > =95 Mac Yang > =95 Paul Yang > =95 Alan Gates > A CLA is already on file for Sushanth. > > > Affiliations > =95 Devaraj Das (Yahoo!) > =95 Ashutosh Chauhan (Yahoo!) > =95 Sushanth Sowmyan (Yahoo!) > =95 Mac Yang (Yahoo!) > =95 Paul Yang (Facebook) > =95 Alan Gates (Yahoo!) > > Sponsors > > Champion > Owen O=92Malley > > > Nominated Mentors > =95 Olga Natkovich (Pig PMC member and Apache VP for Pig) > =95 Alan Gates (Pig PMC member) > =95 John Sichi (Hive PMC member) > > Sponsoring Entity > We are requesting the Incubator to sponsor this project. > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --000e0cd5c924a36433049ceeb87d--