Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 19376 invoked from network); 14 Sep 2010 22:18:03 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Sep 2010 22:18:03 -0000 Received: (qmail 85233 invoked by uid 500); 14 Sep 2010 22:18:02 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 85111 invoked by uid 500); 14 Sep 2010 22:18:02 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 85103 invoked by uid 99); 14 Sep 2010 22:18:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 22:18:02 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of tom.e.white@gmail.com designates 209.85.212.47 as permitted sender) Received: from [209.85.212.47] (HELO mail-vw0-f47.google.com) (209.85.212.47) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 22:17:58 +0000 Received: by vws9 with SMTP id 9so6326460vws.6 for ; Tue, 14 Sep 2010 15:17:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type :content-transfer-encoding; bh=6XkM0OvRn/zj6smmGiw2l3BNX3nXIfbTG8z1Rr+C4ro=; b=gJiEmgyaiO2HDf2VX6W+Xs24K7LpTIki27FGTllg82L5OihUuvcqJiiI5OhmqGUfwy D46vlAcVEiLrvHgQBr5BpnQFoY8yJ+Q03Tb+7K00Sa8ZybOj1Y54cMqQF43MDEuQunPQ Rp+sQR/GzoY8e+tiUaYYJO9+9CQCgULfTqDws= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=KY4023knoQCc23ttWua9XyDvwTrSlW/2Atx9xupQZ00fY2DxcQi4dBUE2expwCteL+ xVnWtQVcccLSULcxNGp3/sjQz/PTZck8wGm40XVnrQAQb3mtfu/UD+EiIYatTKLJMwjI S70ZY5oihW+HixqunfVVCv+J/MscNKomPCP8s= Received: by 10.220.128.204 with SMTP id l12mr278934vcs.242.1284502599166; Tue, 14 Sep 2010 15:16:39 -0700 (PDT) MIME-Version: 1.0 Received: by 10.220.100.84 with HTTP; Tue, 14 Sep 2010 15:16:18 -0700 (PDT) In-Reply-To: References: From: Tom White Date: Tue, 14 Sep 2010 15:16:18 -0700 Message-ID: Subject: Re: [PROPOSAL] Gora to enter Incubator To: general@incubator.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I posted a little earlier volunteering to be a mentor, but it looks like it may be in the moderation queue. Anyway, +1 to the proposal, and happy to help out if you still need a mentor. Cheers, Tom On Tue, Sep 14, 2010 at 2:44 PM, Mattmann, Chris A (388J) wrote: > Hi Folks, > > FYI, if any mentors out there have free cycles and are interested, we are= looking for 1 more mentor to fulfill the Incubator mentor requirements. > > Thanks, > Chris > > > > On 9/13/10 6:10 AM, "Enis Soztutar" wrote: > > Hi all, > > We would like to announce the Proposal for Gora, an ORM for Colum Stores, > for the Apache Incubation. We believe that Gora can find a nice home at > Apache. > > Wiki of the proposal can be found at > http://wiki.apache.org/incubator/GoraProposal > > The proposal is as below. > > > =3D Gora Proposal for Apache Incubation =3D > > =3D=3D Abstract =3D=3D > Gora is an ORM framework for column stores such as Apache HBase and Apach= e > Cassandra with a specific focus on Hadoop. > > =3D=3D Proposal =3D=3D > Although there are various excellent ORM frameworks for relational > databases, data modeling in NoSQL data stores differ profoundly from thei= r > relational cousins. Moreover, data-model agnostic frameworks such as JDO = are > not sufficient for use cases, where one needs to use the full power of th= e > data models in column stores. Gora fills this gap by giving the user an > easy-to-use ORM framework with data store specific mappings and built in > Apache Hadoop support. > > The overall goal for Gora is to become the standard data representation a= nd > persistence framework for big data. The roadmap of Gora can be grouped as > follows. > > =C2=A0* Data Persistence : Persisting objects to Column stores such as HB= ase, > Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; S= QL > databases, such as MySQL, HSQLDB, flat files in local file system of Hado= op > HDFS. > =C2=A0* Data Access : An easy to use Java-friendly common API for accessi= ng the > data regardless of its location. > =C2=A0* Indexing : Persisting objects to Lucene and Solr indexes, > accessing/querying the data with Gora API. > =C2=A0* Analysis : Accesing the data and making analysis through adapters= for > Apache Pig, Apache Hive and Cascading > =C2=A0* MapReduce support : Out-of-the-box and extensive MapReduce (Apach= e > Hadoop) support for data in the data store. > > =3D=3D Background =3D=3D > ORM stands for Object Relation Mapping. It is a technology which abstacts > the persistency layer > (mostly Relational Databases) so that plain domain level objects can be > used, without the cumbersome effort to save/load the data to and from the > database. Gora differs from current solutions in that: > =C2=A0* Gora is specially focussed at NoSQL data stores, but also has lim= ited > support for SQL databases > =C2=A0* The main use case for Gora is to access/analyze big data using Ha= doop. > =C2=A0* Gora uses Avro for bean definition, not byte code enhancement or > annotations > =C2=A0* Object-to-data store mappings are backend specific, so that full = data > model can be utilized. > =C2=A0* Gora is simple since it ignores complex SQL mappings > =C2=A0* Gora will support persistence, indexing and anaysis of data, usin= g Pig, > Lucene, Hive, etc > > =3D=3D Rationale =3D=3D > ORM frameworks are nothing new. But with the explosion of data generated = in > Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasi= ng > popularity. Coupled with limited support to already-proven Apache Hadoop > support in current ORM frameworks, there was a need for a new project. > > Gora is currently hosted at Github. However, Gora has ties to ASF in many > ways. As detailed in the proposal section, Gora will be a high level clie= nt > for many Apache projects and subprojects including Hadoop(common, hdfs, a= nd > mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora > already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started it= s > life inside Apache Nutch project, and now Nutch trunk uses Gora as a > library. Even more, the initial set of committers are all ASF members. > Therefore, we think that Apache will be an excellent home for Gora. > > =3D=3D Initial Goals =3D=3D > Initial goals for Gora can be summarized as: > =C2=A0* Iron out the remaining issues with HBase, Cassandra and SQL suppo= rt. > =C2=A0* Make the first release before the end of the year. > =C2=A0* Improve documentation > =C2=A0* Support for Cascading > > =3D=3D Current Status =3D=3D > =3D=3D=3D Meritocracy =3D=3D=3D > Current commit rights belong to the initial list of committers four of wh= o > are also ASF members. All the developers have extensive experience with > Apache projects. We honor the meritocracy policy of ASF foundation. > > =3D=3D=3D Community =3D=3D=3D > Gora=E2=80=99s community mostly overlap with that of Nutch, Hadoop, HBase= , Avro and > Cassandra. We > have a small community for now (5 initial committers, 18 people tracking = the > project at Github), but have been piggybacking the Nutch community for a > while. If Gora is accepted to Apache Incubator, we expect more traction. > Moreover, with the increasing popularity of NoSQL databases, we expect mo= re > users. > > =3D=3D=3D Core Developers =3D=3D=3D > Gora was started by the initial code base inside Apache Nutch by Do=C4=9F= acan > G=C3=BCney. Then Enis S=C3=B6ztutar has refactored and re-architected the= project out > of Nutch. Later Julien Nioche, Andrzej Bialecki and Do=C4=9Facan has port= ed Nutch > to use the newly formed project. Later, Sertan Alkan has joined. Do=C4=9F= acan and > Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an > Apache Hadoop PMC member. > > =3D=3D=3D Alignment =3D=3D=3D > As discusssed in the second paragraph of Rationale Section, all of the > current developers are Apache people, and four of them are PMC members, > which shows that we have some experience with the Apache way. Moreover, G= ora > is tightly related with lots of Apache projects, Nutch, Hadoop, HBase, > Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its li= fe > inside Nutch, and now nutch trunk uses Gora to persist web crawl data to > HBase, Cassandra and MySQL, which means that Gora is a very critical > component in Nutch. > > =3D=3D Known Risks =3D=3D > =3D=3D=3D Orphaned Products =3D=3D=3D > Most of the development depends on Enis and Do=C4=9Facan for now. Both of= them > intent to continue Gora development. However, we also acknowledge that mo= re > core developers are needed for the project to be truly successful. The > general strategy to acquire more developers will be to acquire more users= , > and encourage users to be active in the community and develop patches. > Moreover, the next release of Nutch planned before the end of 2010 has > extensive Gora support. We expect more interest from Nutch community, and= we > will continue to announce Gora notifications at Hadoop,HBase and Cassandr= a > mailing lists. > > =3D=3D=3D Inexperience with Open Source =3D=3D=3D > We believe that all of the developers have extensive open source experien= ce. > Four of the initial committers are apache members. The codebase is also o= pen > source since April 2010. We also have some documentation, wiki pages, iss= ue > tracker and dev mailing list. > > =3D=3D=3D Homogeneous Developers =3D=3D=3D > We have a semi-distributed development environment where Do=C4=9Facan, En= is and > Sertan share the same office, but Andrzej and Julien are independent. Wit= h > the aim of acquiring more developers, we expect more heterogeneous > development. > > =3D=3D=3D Reliance on Salaried Developers =3D=3D=3D > Gora development have been supported by [[ant.com]] =C2=A0search engine a= s > contract work. It is expected that this contract will continue in the > future. However, even without sponsors, we are commited to continue on Go= ra > development, since we believe in the technology it brings and it=E2=80=99= s vital > role in Nutch, and our other closed sourced projects. > > =3D=3D=3D Relationships with Other Apache Products =3D=3D=3D > Gora will be tightly related to lots of Apache projects: > > =C2=A0* Nutch : Apache nutch was to home to Gora=E2=80=99s initial code b= ase. Now, Nutch > trunk uses Gora as a library. The next relase of Nutch, planned before th= e > end of 2010 will be using Gora=E2=80=99s first release. > =C2=A0* Hadoop : Gora has extensive support for Hadoop MapReduce Gora def= ines all > the necessary data structures for working with Hadoop .Data stored in col= umn > oriented data stores can be analyzed =C2=A0with Gora using Hadoop. > =C2=A0* Avro : Gora uses and extends Avro. Data beans in Gora are defined= using > Avro schemas ,and compiled into Java code with the extended version of th= e > Avro compiler. Avro is also used in data serialization. > =C2=A0* HBase : Gora supports HBase as a persistency backend. > =C2=A0* Cassandra : Gora support Cassandra as a persistency backend. > =C2=A0* Lucene/Solr : Gora intends to support Lucene/Solr as a persistenc= y and > indexing backend. > =C2=A0* Pig : Gora intends to support Pig for data anaysis > =C2=A0* Hive : =C2=A0Gora intends to support Hive for data analysis > > =3D=3D=3D An Excessive Fascination with the Apache Brand =3D=3D=3D > Gora is a natural fit for Apache due to it's current commiters and depend= ing > projects. > > =3D=3D Documentation =3D=3D > =C2=A0* The project is currently hosted at http://github.com/enis/gora/. > =C2=A0* Wiki pages can be found at http://wiki.github.com/enis/gora/. > =C2=A0* List of issues can be found at =C2=A0http://github.com/enis/gora/= issues/. > =C2=A0* Current web address: http://groups.google.com/group/gora-dev. > =C2=A0* Current email address: gora-dev@googlegroups.com. > > =3D=3D Initial Source =3D=3D > The initial source was developed as a patch to the Apache Nutch project. = But > the storage abstraction layer was orthogonal to the web crawler, and we > decided to extract it to a separate project with much wider goals. Thus > Gora, as a project, was born. The initial code is developed by Enis and > Dogacan with ant.com=E2=80=99s sponsorship. > > The code can be found at http://github.com/enis/gora/. > > =3D=3D External Dependencies =3D=3D > External dependencies excluding Apache projects are as follows > =C2=A0* JDOM - http://jdom.org/ - =C2=A0Apache-style license > =C2=A0* SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artist= ic > License, LGPL. SQL Builder is intended to be removed from the source due = to > technical reasons anyway. > =C2=A0* HSQLDB - http://hsqldb.org/ - BSD-style license > =C2=A0* JUnit - http://junit.org - Common Public License 1.0 > =C2=A0* SLF4J - http://www.slf4j.org/ - MIT License > =C2=A0* Google Guava Libraries - http://code.google.com/p/guava-libraries= / - > Apache License 2.0 > > > =3D=3D Required Resources =3D=3D > > =3D=3D=3D Mailing Lists =3D=3D=3D > > =C2=A0* gora-private (with moderated subscriptions) > =C2=A0* gora-dev > =C2=A0* gora-commits > > =3D=3D=3D Subversion Directory =3D=3D=3D > > =C2=A0* [[http://svn.apache.org/repos/asf/incubator/gora]] > > =3D=3D=3D Issue Tracking =3D=3D=3D > =C2=A0* JIRA (GORA) > > =3D=3D=3D Other Resources =3D=3D=3D > We need a wiki at http://wiki.apache.org. Currently, we have a wiki at > Github, Since there is not a lot of pages there, we can manually move the > pages to the wiki at wiki.apache.org. > > =3D=3D Initial Committers =3D=3D > > Name =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 email > Affiliation =C2=A0 =C2=A0 =C2=A0 =C2=A0Timezone > Enis S=C3=B6ztutar =C2=A0 =C2=A0 =C2=A0 enis [at] apache.org =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 Konneka =C2=A0 =C2=A0 =C2=A0 =C2=A0 +3 > Do=C4=9Facan G=C3=BCney =C2=A0dogacan [at] apache.org =C2=A0 =C2=A0Konnek= a =C2=A0 =C2=A0 =C2=A0 =C2=A0 +3 > Sertan Alkan =C2=A0 =C2=A0 =C2=A0 sertanalkan [at] gmail.com Konneka =C2= =A0 =C2=A0 =C2=A0 =C2=A0 +3 > Julien Nioche =C2=A0 =C2=A0 =C2=A0 jnioche [at] apache.org =C2=A0 =C2=A0 = =C2=A0DigitalPebble =C2=A0+1 > Andrzej Bialecki =C2=A0 ab [at] apache.org =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 Sigram > > > =3D=3D=3D Affiliations =3D=3D=3D > All of the parties are affiliated with open source consulting shops. Most= of > the development was sponsored by ant.com, however we expect that the amou= nt > of volunteer work will increase, and more developers will come on board. > > =3D=3D Sponsors =3D=3D > > =3D=3D=3D Champion =3D=3D=3D > =C2=A0* Chris Mattmann (mattmann AT apache DOT org) > > =3D=3D=3D Nominated Mentors =3D=3D=3D > =C2=A0* Chris Mattmann (mattmann AT apache DOT org) > =C2=A0* Andrzej Bialecki (ab AT apache DOT org ) > > =3D=3D=3D Sponsoring Entity =3D=3D=3D > Apache Incubator. Successful graduation can result in either being a TLP,= or > a subproject of > Hadoop, since most of the community is projected to overlap. > > > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Chris Mattmann, Ph.D. > Senior Computer Scientist > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA > Office: 171-266B, Mailstop: 171-246 > Email: Chris.Mattmann@jpl.nasa.gov > WWW: =C2=A0 http://sunset.usc.edu/~mattmann/ > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > Adjunct Assistant Professor, Computer Science Department > University of Southern California, Los Angeles, CA 90089 USA > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org