Return-Path: Delivered-To: apmail-incubator-general-archive@www.apache.org Received: (qmail 2133 invoked from network); 14 Sep 2010 16:59:25 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 14 Sep 2010 16:59:25 -0000 Received: (qmail 35502 invoked by uid 500); 14 Sep 2010 16:59:25 -0000 Delivered-To: apmail-incubator-general-archive@incubator.apache.org Received: (qmail 35345 invoked by uid 500); 14 Sep 2010 16:59:24 -0000 Mailing-List: contact general-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@incubator.apache.org Delivered-To: mailing list general@incubator.apache.org Received: (qmail 35337 invoked by uid 99); 14 Sep 2010 16:59:24 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 16:59:24 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.9] (HELO minotaur.apache.org) (140.211.11.9) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 14 Sep 2010 16:59:22 +0000 Received: (qmail 2094 invoked by uid 99); 14 Sep 2010 16:59:02 -0000 Received: from localhost.apache.org (HELO ahart-10123092236.jpl.nasa.gov) (127.0.0.1) (smtp-auth username ahart, mechanism plain) by minotaur.apache.org (qpsmtpd/0.29) with ESMTP; Tue, 14 Sep 2010 16:59:02 +0000 Message-ID: <4C8FAB04.506@apache.org> Date: Tue, 14 Sep 2010 10:04:04 -0700 From: Andrew Hart Organization: Apache Software Foundation User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.1.11) Gecko/20100711 Thunderbird/3.0.6 MIME-Version: 1.0 To: general@incubator.apache.org Subject: Re: [PROPOSAL] Gora to enter Incubator References: <4C8FA712.8000607@apache.org> In-Reply-To: <4C8FA712.8000607@apache.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit +1 (not binding) This really strikes a chord with me and I would love to help out with this project in any way that I can. I'm a committer on the incubating OODT project and have experience with a variety of the "traditional" ORM's, developing web interfaces, and data modeling. -Andrew. On 9/14/10 9:47 AM, Doug Cutting wrote: > +1 Sounds like a great project. > > Doug > > On 09/13/2010 06:10 AM, Enis Soztutar wrote: > >> Hi all, >> >> We would like to announce the Proposal for Gora, an ORM for Colum Stores, >> for the Apache Incubation. We believe that Gora can find a nice home at >> Apache. >> >> Wiki of the proposal can be found at >> http://wiki.apache.org/incubator/GoraProposal >> >> The proposal is as below. >> >> >> = Gora Proposal for Apache Incubation = >> >> == Abstract == >> Gora is an ORM framework for column stores such as Apache HBase and Apache >> Cassandra with a specific focus on Hadoop. >> >> == Proposal == >> Although there are various excellent ORM frameworks for relational >> databases, data modeling in NoSQL data stores differ profoundly from their >> relational cousins. Moreover, data-model agnostic frameworks such as JDO are >> not sufficient for use cases, where one needs to use the full power of the >> data models in column stores. Gora fills this gap by giving the user an >> easy-to-use ORM framework with data store specific mappings and built in >> Apache Hadoop support. >> >> The overall goal for Gora is to become the standard data representation and >> persistence framework for big data. The roadmap of Gora can be grouped as >> follows. >> >> * Data Persistence : Persisting objects to Column stores such as HBase, >> Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL >> databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop >> HDFS. >> * Data Access : An easy to use Java-friendly common API for accessing the >> data regardless of its location. >> * Indexing : Persisting objects to Lucene and Solr indexes, >> accessing/querying the data with Gora API. >> * Analysis : Accesing the data and making analysis through adapters for >> Apache Pig, Apache Hive and Cascading >> * MapReduce support : Out-of-the-box and extensive MapReduce (Apache >> Hadoop) support for data in the data store. >> >> == Background == >> ORM stands for Object Relation Mapping. It is a technology which abstacts >> the persistency layer >> (mostly Relational Databases) so that plain domain level objects can be >> used, without the cumbersome effort to save/load the data to and from the >> database. Gora differs from current solutions in that: >> * Gora is specially focussed at NoSQL data stores, but also has limited >> support for SQL databases >> * The main use case for Gora is to access/analyze big data using Hadoop. >> * Gora uses Avro for bean definition, not byte code enhancement or >> annotations >> * Object-to-data store mappings are backend specific, so that full data >> model can be utilized. >> * Gora is simple since it ignores complex SQL mappings >> * Gora will support persistence, indexing and anaysis of data, using Pig, >> Lucene, Hive, etc >> >> == Rationale == >> ORM frameworks are nothing new. But with the explosion of data generated in >> Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasing >> popularity. Coupled with limited support to already-proven Apache Hadoop >> support in current ORM frameworks, there was a need for a new project. >> >> Gora is currently hosted at Github. However, Gora has ties to ASF in many >> ways. As detailed in the proposal section, Gora will be a high level client >> for many Apache projects and subprojects including Hadoop(common, hdfs, and >> mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora >> already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started its >> life inside Apache Nutch project, and now Nutch trunk uses Gora as a >> library. Even more, the initial set of committers are all ASF members. >> Therefore, we think that Apache will be an excellent home for Gora. >> >> == Initial Goals == >> Initial goals for Gora can be summarized as: >> * Iron out the remaining issues with HBase, Cassandra and SQL support. >> * Make the first release before the end of the year. >> * Improve documentation >> * Support for Cascading >> >> == Current Status == >> === Meritocracy === >> Current commit rights belong to the initial list of committers four of who >> are also ASF members. All the developers have extensive experience with >> Apache projects. We honor the meritocracy policy of ASF foundation. >> >> === Community === >> Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro and >> Cassandra. We >> have a small community for now (5 initial committers, 18 people tracking the >> project at Github), but have been piggybacking the Nutch community for a >> while. If Gora is accepted to Apache Incubator, we expect more traction. >> Moreover, with the increasing popularity of NoSQL databases, we expect more >> users. >> >> === Core Developers === >> Gora was started by the initial code base inside Apache Nutch by Doğacan >> Güney. Then Enis Söztutar has refactored and re-architected the project out >> of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported Nutch >> to use the newly formed project. Later, Sertan Alkan has joined. Doğacan and >> Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an >> Apache Hadoop PMC member. >> >> === Alignment === >> As discusssed in the second paragraph of Rationale Section, all of the >> current developers are Apache people, and four of them are PMC members, >> which shows that we have some experience with the Apache way. Moreover, Gora >> is tightly related with lots of Apache projects, Nutch, Hadoop, HBase, >> Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its life >> inside Nutch, and now nutch trunk uses Gora to persist web crawl data to >> HBase, Cassandra and MySQL, which means that Gora is a very critical >> component in Nutch. >> >> == Known Risks == >> === Orphaned Products === >> Most of the development depends on Enis and Doğacan for now. Both of them >> intent to continue Gora development. However, we also acknowledge that more >> core developers are needed for the project to be truly successful. The >> general strategy to acquire more developers will be to acquire more users, >> and encourage users to be active in the community and develop patches. >> Moreover, the next release of Nutch planned before the end of 2010 has >> extensive Gora support. We expect more interest from Nutch community, and we >> will continue to announce Gora notifications at Hadoop,HBase and Cassandra >> mailing lists. >> >> === Inexperience with Open Source === >> We believe that all of the developers have extensive open source experience. >> Four of the initial committers are apache members. The codebase is also open >> source since April 2010. We also have some documentation, wiki pages, issue >> tracker and dev mailing list. >> >> === Homogeneous Developers === >> We have a semi-distributed development environment where Doğacan, Enis and >> Sertan share the same office, but Andrzej and Julien are independent. With >> the aim of acquiring more developers, we expect more heterogeneous >> development. >> >> === Reliance on Salaried Developers === >> Gora development have been supported by [[ant.com]] search engine as >> contract work. It is expected that this contract will continue in the >> future. However, even without sponsors, we are commited to continue on Gora >> development, since we believe in the technology it brings and it’s vital >> role in Nutch, and our other closed sourced projects. >> >> === Relationships with Other Apache Products === >> Gora will be tightly related to lots of Apache projects: >> >> * Nutch : Apache nutch was to home to Gora’s initial code base. Now, Nutch >> trunk uses Gora as a library. The next relase of Nutch, planned before the >> end of 2010 will be using Gora’s first release. >> * Hadoop : Gora has extensive support for Hadoop MapReduce Gora defines all >> the necessary data structures for working with Hadoop .Data stored in column >> oriented data stores can be analyzed with Gora using Hadoop. >> * Avro : Gora uses and extends Avro. Data beans in Gora are defined using >> Avro schemas ,and compiled into Java code with the extended version of the >> Avro compiler. Avro is also used in data serialization. >> * HBase : Gora supports HBase as a persistency backend. >> * Cassandra : Gora support Cassandra as a persistency backend. >> * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency and >> indexing backend. >> * Pig : Gora intends to support Pig for data anaysis >> * Hive : Gora intends to support Hive for data analysis >> >> === An Excessive Fascination with the Apache Brand === >> Gora is a natural fit for Apache due to it's current commiters and depending >> projects. >> >> == Documentation == >> * The project is currently hosted at http://github.com/enis/gora/. >> * Wiki pages can be found at http://wiki.github.com/enis/gora/. >> * List of issues can be found at http://github.com/enis/gora/issues/. >> * Current web address: http://groups.google.com/group/gora-dev. >> * Current email address: gora-dev@googlegroups.com. >> >> == Initial Source == >> The initial source was developed as a patch to the Apache Nutch project. But >> the storage abstraction layer was orthogonal to the web crawler, and we >> decided to extract it to a separate project with much wider goals. Thus >> Gora, as a project, was born. The initial code is developed by Enis and >> Dogacan with ant.com’s sponsorship. >> >> The code can be found at http://github.com/enis/gora/. >> >> == External Dependencies == >> External dependencies excluding Apache projects are as follows >> * JDOM - http://jdom.org/ - Apache-style license >> * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic >> License, LGPL. SQL Builder is intended to be removed from the source due to >> technical reasons anyway. >> * HSQLDB - http://hsqldb.org/ - BSD-style license >> * JUnit - http://junit.org - Common Public License 1.0 >> * SLF4J - http://www.slf4j.org/ - MIT License >> * Google Guava Libraries - http://code.google.com/p/guava-libraries/ - >> Apache License 2.0 >> >> >> == Required Resources == >> >> === Mailing Lists === >> >> * gora-private (with moderated subscriptions) >> * gora-dev >> * gora-commits >> >> === Subversion Directory === >> >> * [[http://svn.apache.org/repos/asf/incubator/gora]] >> >> === Issue Tracking === >> * JIRA (GORA) >> >> === Other Resources === >> We need a wiki at http://wiki.apache.org. Currently, we have a wiki at >> Github, Since there is not a lot of pages there, we can manually move the >> pages to the wiki at wiki.apache.org. >> >> == Initial Committers == >> >> Name email >> Affiliation Timezone >> Enis Söztutar enis [at] apache.org Konneka +3 >> Doğacan Güney dogacan [at] apache.org Konneka +3 >> Sertan Alkan sertanalkan [at] gmail.com Konneka +3 >> Julien Nioche jnioche [at] apache.org DigitalPebble +1 >> Andrzej Bialecki ab [at] apache.org Sigram >> >> >> === Affiliations === >> All of the parties are affiliated with open source consulting shops. Most of >> the development was sponsored by ant.com, however we expect that the amount >> of volunteer work will increase, and more developers will come on board. >> >> == Sponsors == >> >> === Champion === >> * Chris Mattmann (mattmann AT apache DOT org) >> >> === Nominated Mentors === >> * Chris Mattmann (mattmann AT apache DOT org) >> * Andrzej Bialecki (ab AT apache DOT org ) >> >> === Sponsoring Entity === >> Apache Incubator. Successful graduation can result in either being a TLP, or >> a subproject of >> Hadoop, since most of the community is projected to overlap. >> >> > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org > For additional commands, e-mail: general-help@incubator.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org For additional commands, e-mail: general-help@incubator.apache.org