incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mattmann, Chris A (388J)" <chris.a.mattm...@jpl.nasa.gov>
Subject Re: [PROPOSAL] Gora to enter Incubator
Date Tue, 14 Sep 2010 20:12:18 GMT
Hi Henry,

Thanks. If you have cycles, then the help would be most welcomed. Feel free to add yourself
to the list of committers on the wiki page.

Cheers,
Chris



On 9/14/10 10:48 AM, "Henry Saputra" <henry.saputra@gmail.com> wrote:

+1 (non-binding)  very interesting project.

I would definitely love to help on this project any way I can.

- Henry

On Mon, Sep 13, 2010 at 6:10 AM, Enis Soztutar <enis.soz.nutch@gmail.com>wrote:

> Hi all,
>
> We would like to announce the Proposal for Gora, an ORM for Colum Stores,
> for the Apache Incubation. We believe that Gora can find a nice home at
> Apache.
>
> Wiki of the proposal can be found at
> http://wiki.apache.org/incubator/GoraProposal
>
> The proposal is as below.
>
>
> = Gora Proposal for Apache Incubation =
>
> == Abstract ==
> Gora is an ORM framework for column stores such as Apache HBase and Apache
> Cassandra with a specific focus on Hadoop.
>
> == Proposal ==
> Although there are various excellent ORM frameworks for relational
> databases, data modeling in NoSQL data stores differ profoundly from their
> relational cousins. Moreover, data-model agnostic frameworks such as JDO
> are
> not sufficient for use cases, where one needs to use the full power of the
> data models in column stores. Gora fills this gap by giving the user an
> easy-to-use ORM framework with data store specific mappings and built in
> Apache Hadoop support.
>
> The overall goal for Gora is to become the standard data representation and
> persistence framework for big data. The roadmap of Gora can be grouped as
> follows.
>
>  * Data Persistence : Persisting objects to Column stores such as HBase,
> Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL
> databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop
> HDFS.
>  * Data Access : An easy to use Java-friendly common API for accessing the
> data regardless of its location.
>  * Indexing : Persisting objects to Lucene and Solr indexes,
> accessing/querying the data with Gora API.
>  * Analysis : Accesing the data and making analysis through adapters for
> Apache Pig, Apache Hive and Cascading
>  * MapReduce support : Out-of-the-box and extensive MapReduce (Apache
> Hadoop) support for data in the data store.
>
> == Background ==
> ORM stands for Object Relation Mapping. It is a technology which abstacts
> the persistency layer
> (mostly Relational Databases) so that plain domain level objects can be
> used, without the cumbersome effort to save/load the data to and from the
> database. Gora differs from current solutions in that:
>  * Gora is specially focussed at NoSQL data stores, but also has limited
> support for SQL databases
>  * The main use case for Gora is to access/analyze big data using Hadoop.
>  * Gora uses Avro for bean definition, not byte code enhancement or
> annotations
>  * Object-to-data store mappings are backend specific, so that full data
> model can be utilized.
>  * Gora is simple since it ignores complex SQL mappings
>  * Gora will support persistence, indexing and anaysis of data, using Pig,
> Lucene, Hive, etc
>
> == Rationale ==
> ORM frameworks are nothing new. But with the explosion of data generated in
> Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasing
> popularity. Coupled with limited support to already-proven Apache Hadoop
> support in current ORM frameworks, there was a need for a new project.
>
> Gora is currently hosted at Github. However, Gora has ties to ASF in many
> ways. As detailed in the proposal section, Gora will be a high level client
> for many Apache projects and subprojects including Hadoop(common, hdfs, and
> mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora
> already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started its
> life inside Apache Nutch project, and now Nutch trunk uses Gora as a
> library. Even more, the initial set of committers are all ASF members.
> Therefore, we think that Apache will be an excellent home for Gora.
>
> == Initial Goals ==
> Initial goals for Gora can be summarized as:
>  * Iron out the remaining issues with HBase, Cassandra and SQL support.
>  * Make the first release before the end of the year.
>  * Improve documentation
>  * Support for Cascading
>
> == Current Status ==
> === Meritocracy ===
> Current commit rights belong to the initial list of committers four of who
> are also ASF members. All the developers have extensive experience with
> Apache projects. We honor the meritocracy policy of ASF foundation.
>
> === Community ===
> Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro and
> Cassandra. We
> have a small community for now (5 initial committers, 18 people tracking
> the
> project at Github), but have been piggybacking the Nutch community for a
> while. If Gora is accepted to Apache Incubator, we expect more traction.
> Moreover, with the increasing popularity of NoSQL databases, we expect more
> users.
>
> === Core Developers ===
> Gora was started by the initial code base inside Apache Nutch by Doğacan
> Güney. Then Enis Söztutar has refactored and re-architected the project out
> of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported
> Nutch
> to use the newly formed project. Later, Sertan Alkan has joined. Doğacan
> and
> Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an
> Apache Hadoop PMC member.
>
> === Alignment ===
> As discusssed in the second paragraph of Rationale Section, all of the
> current developers are Apache people, and four of them are PMC members,
> which shows that we have some experience with the Apache way. Moreover,
> Gora
> is tightly related with lots of Apache projects, Nutch, Hadoop, HBase,
> Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its life
> inside Nutch, and now nutch trunk uses Gora to persist web crawl data to
> HBase, Cassandra and MySQL, which means that Gora is a very critical
> component in Nutch.
>
> == Known Risks ==
> === Orphaned Products ===
> Most of the development depends on Enis and Doğacan for now. Both of them
> intent to continue Gora development. However, we also acknowledge that more
> core developers are needed for the project to be truly successful. The
> general strategy to acquire more developers will be to acquire more users,
> and encourage users to be active in the community and develop patches.
> Moreover, the next release of Nutch planned before the end of 2010 has
> extensive Gora support. We expect more interest from Nutch community, and
> we
> will continue to announce Gora notifications at Hadoop,HBase and Cassandra
> mailing lists.
>
> === Inexperience with Open Source ===
> We believe that all of the developers have extensive open source
> experience.
> Four of the initial committers are apache members. The codebase is also
> open
> source since April 2010. We also have some documentation, wiki pages, issue
> tracker and dev mailing list.
>
> === Homogeneous Developers ===
> We have a semi-distributed development environment where Doğacan, Enis and
> Sertan share the same office, but Andrzej and Julien are independent. With
> the aim of acquiring more developers, we expect more heterogeneous
> development.
>
> === Reliance on Salaried Developers ===
> Gora development have been supported by [[ant.com]]  search engine as
> contract work. It is expected that this contract will continue in the
> future. However, even without sponsors, we are commited to continue on Gora
> development, since we believe in the technology it brings and it’s vital
> role in Nutch, and our other closed sourced projects.
>
> === Relationships with Other Apache Products ===
> Gora will be tightly related to lots of Apache projects:
>
>  * Nutch : Apache nutch was to home to Gora’s initial code base. Now, Nutch
> trunk uses Gora as a library. The next relase of Nutch, planned before the
> end of 2010 will be using Gora’s first release.
>  * Hadoop : Gora has extensive support for Hadoop MapReduce Gora defines
> all
> the necessary data structures for working with Hadoop .Data stored in
> column
> oriented data stores can be analyzed  with Gora using Hadoop.
>  * Avro : Gora uses and extends Avro. Data beans in Gora are defined using
> Avro schemas ,and compiled into Java code with the extended version of the
> Avro compiler. Avro is also used in data serialization.
>  * HBase : Gora supports HBase as a persistency backend.
>  * Cassandra : Gora support Cassandra as a persistency backend.
>  * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency and
> indexing backend.
>  * Pig : Gora intends to support Pig for data anaysis
>  * Hive :  Gora intends to support Hive for data analysis
>
> === An Excessive Fascination with the Apache Brand ===
> Gora is a natural fit for Apache due to it's current commiters and
> depending
> projects.
>
> == Documentation ==
>  * The project is currently hosted at http://github.com/enis/gora/.
>  * Wiki pages can be found at http://wiki.github.com/enis/gora/.
>  * List of issues can be found at  http://github.com/enis/gora/issues/.
>  * Current web address: http://groups.google.com/group/gora-dev.
>  * Current email address: gora-dev@googlegroups.com.
>
> == Initial Source ==
> The initial source was developed as a patch to the Apache Nutch project.
> But
> the storage abstraction layer was orthogonal to the web crawler, and we
> decided to extract it to a separate project with much wider goals. Thus
> Gora, as a project, was born. The initial code is developed by Enis and
> Dogacan with ant.com’s sponsorship.
>
> The code can be found at http://github.com/enis/gora/.
>
> == External Dependencies ==
> External dependencies excluding Apache projects are as follows
>  * JDOM - http://jdom.org/ -  Apache-style license
>  * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic
> License, LGPL. SQL Builder is intended to be removed from the source due to
> technical reasons anyway.
>  * HSQLDB - http://hsqldb.org/ - BSD-style license
>  * JUnit - http://junit.org - Common Public License 1.0
>  * SLF4J - http://www.slf4j.org/ - MIT License
>  * Google Guava Libraries - http://code.google.com/p/guava-libraries/ -
> Apache License 2.0
>
>
> == Required Resources ==
>
> === Mailing Lists ===
>
>  * gora-private (with moderated subscriptions)
>  * gora-dev
>  * gora-commits
>
> === Subversion Directory ===
>
>  * [[http://svn.apache.org/repos/asf/incubator/gora]]
>
> === Issue Tracking ===
>  * JIRA (GORA)
>
> === Other Resources ===
> We need a wiki at http://wiki.apache.org. Currently, we have a wiki at
> Github, Since there is not a lot of pages there, we can manually move the
> pages to the wiki at wiki.apache.org.
>
> == Initial Committers ==
>
> Name                   email
> Affiliation        Timezone
> Enis Söztutar       enis [at] apache.org           Konneka         +3
> Doğacan Güney  dogacan [at] apache.org    Konneka         +3
> Sertan Alkan       sertanalkan [at] gmail.com Konneka         +3
> Julien Nioche       jnioche [at] apache.org      DigitalPebble  +1
> Andrzej Bialecki   ab [at] apache.org             Sigram
>
>
> === Affiliations ===
> All of the parties are affiliated with open source consulting shops. Most
> of
> the development was sponsored by ant.com, however we expect that the
> amount
> of volunteer work will increase, and more developers will come on board.
>
> == Sponsors ==
>
> === Champion ===
>  * Chris Mattmann (mattmann AT apache DOT org)
>
> === Nominated Mentors ===
>  * Chris Mattmann (mattmann AT apache DOT org)
>  * Andrzej Bialecki (ab AT apache DOT org )
>
> === Sponsoring Entity ===
> Apache Incubator. Successful graduation can result in either being a TLP,
> or
> a subproject of
> Hadoop, since most of the community is projected to overlap.
>



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message