Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 90005 invoked from network); 22 Sep 2010 22:50:31 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Sep 2010 22:50:31 -0000 Received: (qmail 76865 invoked by uid 500); 22 Sep 2010 22:50:31 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 76623 invoked by uid 500); 22 Sep 2010 22:50:30 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Delivered-To: moderator for general@incubator.apache.org Received: (qmail 92002 invoked by uid 99); 22 Sep 2010 16:22:48 -0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org MIME-Version: 1.0 In-Reply-To: References: From: Tom White Date: Wed, 22 Sep 2010 09:22:03 -0700 Message-ID: Subject: Re: [VOTE] Accept Gora into the Apache Incubator To: general@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable +1 (binding) Tom On Sun, Sep 19, 2010 at 8:21 PM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > Over the past week or so we've been discussing the Gora project and bring= ing > it into the Apache Incubator [1]. It's time to call a VOTE thread on the > issue. Please VOTE below: > > [ ] +1 Accept Gora into the Apache Incubator. > [ ] +0 Don't care. > [ ] -1 Don't accept Gora into the Apache Incubator because... > > I'll leave the VOTE open for the remainder of the week (ending on 9/24). > Here's my +1 (IPMC binding). > > [1] http://s.apache.org/MPw > > Cheers, > Chris > > P.S. The wiki text for the proposal is pasted below. > > ---------- > Gora Proposal for Apache Incubation > > Abstract > Gora is an ORM framework for column stores such as Apache HBase and Apach= e > Cassandra with a specific focus on Hadoop. > > Proposal > Although there are various excellent ORM frameworks for relational > databases, data modeling in NoSQL data stores differ profoundly from thei= r > relational cousins. Moreover, data-model agnostic frameworks such as JDO = are > not sufficient for use cases, where one needs to use the full power of th= e > data models in column stores. Gora fills this gap by giving the user an > easy-to-use ORM framework with data store specific mappings and built in > Apache Hadoop support. > > The overall goal for Gora is to become the standard data representation a= nd > persistence framework for big data. The roadmap of Gora can be grouped as > follows. > * Data Persistence : Persisting objects to Column stores such as HBase, > Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; S= QL > databases, such as MySQL, HSQLDB, flat files in local file system of Hado= op > HDFS. > * Data Access : An easy to use Java-friendly common API for accessing the > data regardless of its location. > * Indexing : Persisting objects to Lucene and Solr indexes, > accessing/querying the data with Gora API. > * Analysis : Accesing the data and making analysis through adapters for > Apache Pig, Apache Hive and Cascading > * MapReduce =C2=A0support : > Out-of-the-box and extensive MapReduce > =C2=A0(Apache Hadoop) suppor= t for > data in the data store. > > Background > ORM stands for Object Relation Mapping. It is a technology which abstacts > the persistency layer (mostly Relational Databases) so that plain domain > level objects can be used, without the cumbersome effort to save/load the > data to and from the database. Gora differs from current solutions in tha= t: > * Gora is specially focussed at NoSQL data stores, but also has limited > support for SQL databases > * The main use case for Gora is to access/analyze big data using Hadoop. > * Gora uses Avro for bean definition, not byte code enhancement or > annotations > * Object-to-data store mappings are backend specific, so that full data > model can be utilized. > * Gora is simple since it ignores complex SQL mappings > * Gora will support persistence, indexing and anaysis of data, using Pig, > Lucene, Hive, etc > Rationale > ORM frameworks are nothing new. But with the explosion of data generated = in > Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasi= ng > popularity. Coupled with limited support to already-proven Apache Hadoop > support in current ORM frameworks, there was a need for a new project. > > Gora is currently hosted at Github. However, Gora has ties to ASF in many > ways. As detailed in the proposal section, Gora will be a high level clie= nt > for many Apache projects and subprojects including Hadoop(common, hdfs, a= nd > mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora > already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started it= s > life inside Apache Nutch project, and now Nutch trunk uses Gora as a > library. Even more, the initial set of committers are all ASF members. > Therefore, we think that Apache will be an excellent home for Gora. > > Initial Goals > Initial goals for Gora can be summarized as: > * Iron out the remaining issues with HBase, Cassandra and SQL support. > * Make the first release before the end of the year. > * Improve documentation > * Support for Cascading > Current Status > Meritocracy > Current commit rights belong to the initial list of committers four of wh= o > are also ASF members. All the developers have extensive experience with > Apache projects. We honor the meritocracy policy of ASF foundation. > > Community > Gora=E2=80=99s community mostly overlap with that of Nutch, Hadoop, HBase= , Avro and > Cassandra. We =C2=A0have a small community for now (5 initial committers,= 18 > people tracking the project at Github), but have been piggybacking the Nu= tch > community for a while. If Gora is accepted to Apache Incubator, we expect > more traction. Moreover, with the increasing popularity of NoSQL database= s, > we expect more users. > > Core Developers > Gora was started by the initial code base inside Apache Nutch by Do=C4=9F= acan > G=C3=BCney. Then Enis S=C3=B6ztutar has refactored and re-architected the= project out > of Nutch. Later Julien Nioche, Andrzej Bialecki and Do=C4=9Facan has port= ed Nutch > to use the newly formed project. Later, Sertan Alkan has joined. Do=C4=9F= acan and > Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an > Apache Hadoop PMC member. > > Alignment > As discusssed in the second paragraph of Rationale Section, all of the > current developers are Apache people, and four of them are PMC members, > which shows that we have some experience with the Apache way. Moreover, G= ora > is tightly related with lots of Apache projects, Nutch, Hadoop, HBase, > Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its li= fe > inside Nutch, and now nutch trunk uses Gora to persist web crawl data to > HBase, Cassandra and MySQL, which means that Gora is a very critical > component in Nutch. > > Known Risks > Orphaned Products > Most of the development depends on Enis and Do=C4=9Facan for now. Both of= them > intent to continue Gora development. However, we also acknowledge that mo= re > core developers are needed for the project to be truly successful. The > general strategy to acquire more developers will be to acquire more users= , > and encourage users to be active in the community and develop patches. > Moreover, the next release of Nutch planned before the end of 2010 has > extensive Gora support. We expect more interest from Nutch community, and= we > will continue to announce Gora notifications at Hadoop,HBase and Cassandr= a > mailing lists. > > Inexperience with Open Source > We believe that all of the developers have extensive open source experien= ce. > Four of the initial committers are apache members. The codebase is also o= pen > source since April 2010. We also have some documentation, wiki pages, iss= ue > tracker and dev mailing list. > > Homogeneous Developers > We have a semi-distributed development environment where Do=C4=9Facan, En= is and > Sertan share the same office, but Andrzej and Julien are independent. Wit= h > the aim of acquiring more developers, we expect more heterogeneous > development. > > Reliance on Salaried Developers > Gora development have been supported by ant.com > =C2=A0search engine as contrac= t work. > It is expected that this contract will continue in the future. However, e= ven > without sponsors, we are commited to continue on Gora development, since = we > believe in the technology it brings and it=E2=80=99s vital role in Nutch,= and our > other closed sourced projects. > > Relationships with Other Apache Products > Gora will be tightly related to lots of Apache projects: > * > * Nutch : Apache nutch was to home to Gora=E2=80=99s initial code base. N= ow, Nutch > trunk uses Gora as a library. The next relase of Nutch, planned before th= e > end of 2010 will be using Gora=E2=80=99s first release. > * > * Hadoop : Gora has extensive support for Hadoop MapReduce > =C2=A0Gora defines all the n= ecessary > data structures for working with Hadoop .Data stored in column oriented d= ata > stores can be analyzed =C2=A0with Gora using Hadoop. > * > * Avro : Gora uses and extends Avro. Data beans in Gora are defined using > Avro schemas ,and compiled into Java code with the extended version of th= e > Avro compiler. Avro is also used in data serialization. > * > * HBase : Gora supports HBase as a persistency backend. > * > * Cassandra : Gora support Cassandra as a persistency backend. > * > * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency and > indexing backend. > * > * Pig : Gora intends to support Pig for data anaysis > * > * Hive : =C2=A0Gora intends to support Hive for data analysis > > An Excessive Fascination with the Apache Brand > Gora is a natural fit for Apache due to it's current commiters and depend= ing > projects. > > Documentation > * The project is currently hosted at http://github.com/enis/gora/. > * > * Wiki pages can be found at http://wiki.github.com/enis/gora/. > * > * List of issues can be found at =C2=A0http://github.com/enis/gora/issues= /. > * > * Current web address: http://groups.google.com/group/gora-dev. > * > * Current email address: gora-dev@googlegroups.com. > > Initial Source > The initial source was developed as a patch to the Apache Nutch project. = But > the storage abstraction layer was orthogonal to the web crawler, and we > decided to extract it to a separate project with much wider goals. Thus > Gora, as a project, was born. The initial code is developed by Enis and > Dogacan with ant.com=E2=80=99s sponsorship. > > The code can be found at http://github.com/enis/gora/. > > External Dependencies > External dependencies excluding Apache projects are as follows > * > * JDOM - http://jdom.org/ - =C2=A0Apache-style license > * > * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic > License, LGPL. SQL Builder is intended to be removed from the source due = to > technical reasons anyway. > * > * HSQLDB - http://hsqldb.org/ - BSD-style license > * > * JUnit - http://junit.org - Common Public License 1.0 > * > * SLF4J - http://www.slf4j.org/ - MIT License > * > * Google Guava Libraries - http://code.google.com/p/guava-libraries/ - > Apache License 2.0 > > Required Resources > Mailing Lists > * gora-private (with moderated subscriptions) > * gora-dev > * gora-commits > Subversion Directory > * http://svn.apache.org/repos/asf/incubator/gora > > Issue Tracking > * JIRA (GORA) > Other Resources > We need a wiki at http://wiki.apache.org. Currently, we have a wiki at > Github, Since there is not a lot of pages there, we can manually move the > pages to the wiki at wiki.apache.org. > > Initial Committers > * =C2=A0 =C2=A0Name =C2=A0 =C2=A0 =C2=A0 =C2=A0 email =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Affiliation =C2=A0 Timezone > * =C2=A0 =C2=A0Enis S=C3=B6ztutar =C2=A0 =C2=A0 =C2=A0enis [at] apache.or= g =C2=A0 =C2=A0 =C2=A0 =C2=A0 Konneka =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 +3 > * =C2=A0 =C2=A0Do=C4=9Facan G=C3=BCney =C2=A0 =C2=A0 =C2=A0dogacan [at] a= pache.org =C2=A0 =C2=A0 =C2=A0Konneka =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 +3 > * =C2=A0 =C2=A0Sertan Alkan =C2=A0 =C2=A0 =C2=A0 sertanalkan [at] gmail.c= om =C2=A0 Konneka =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 +3 > * =C2=A0 =C2=A0Julien Nioche =C2=A0 =C2=A0 =C2=A0jnioche [at] apache.org = =C2=A0 =C2=A0 =C2=A0DigitalPebble > =C2=A0 =C2=A0 =C2=A0 =C2= =A0+1 > * =C2=A0 =C2=A0Andrzej Bialecki =C2=A0 ab [at] apache.org =C2=A0 =C2=A0 = =C2=A0 =C2=A0 =C2=A0 Sigram > * =C2=A0 =C2=A0Andrew Hart =C2=A0 =C2=A0 =C2=A0 =C2=A0ahart [at] apache.o= rg =C2=A0 =C2=A0 =C2=A0 =C2=A0NASA JPL =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0-8 > * =C2=A0 =C2=A0Dave Woollard =C2=A0 =C2=A0 =C2=A0woollard [at] apache.org= =C2=A0 =C2=A0 NASA JPL =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-8 > * =C2=A0 =C2=A0Henry Saputra =C2=A0 =C2=A0 =C2=A0hsaputra [at] apache.org= =C2=A0 =C2=A0 Yahoo! =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0-8 > > =C2=A0Affiliations > =C2=A0All of the parties are affiliated with companies and organizations = that are > =C2=A0familiar with the development of open source . Most of the original= Gora > =C2=A0development was sponsored by ant.com, however we expect that the am= ount of > =C2=A0volunteer work will increase, and more developers will come on boar= d. > > =C2=A0Sponsors > =C2=A0Champion > =C2=A0* Chris Mattmann (mattmann AT apache DOT org) > =C2=A0Nominated Mentors > =C2=A0* Chris Mattmann (mattmann AT apache DOT org) > =C2=A0* Andrzej Bialecki (ab AT apache DOT org ) > =C2=A0* Tom White (tomwhite AT apache DOT org) > =C2=A0Sponsoring Entity > =C2=A0Apache Incubator. Successful graduation can result in either being = a TLP, > or a subproject of =C2=A0Hadoop, since most of the community is projected= to > overlap. > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: Chris.Mattmann@jpl.nasa.gov > WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org