incubator-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohammad Nour El-Din <nour.moham...@gmail.com>
Subject Re: [VOTE] Accept Gora into the Apache Incubator
Date Mon, 20 Sep 2010 08:17:39 GMT
+1 (non-binding)

On Mon, Sep 20, 2010 at 8:36 AM, Henry Saputra <henry.saputra@gmail.com> wrote:
> +1 (non-binding)
>
> - Henry
>
> On Sun, Sep 19, 2010 at 8:21 PM, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Hi Folks,
>>
>> Over the past week or so we've been discussing the Gora project and
>> bringing
>> it into the Apache Incubator [1]. It's time to call a VOTE thread on the
>> issue. Please VOTE below:
>>
>> [ ] +1 Accept Gora into the Apache Incubator.
>> [ ] +0 Don't care.
>> [ ] -1 Don't accept Gora into the Apache Incubator because...
>>
>> I'll leave the VOTE open for the remainder of the week (ending on 9/24).
>> Here's my +1 (IPMC binding).
>>
>> [1] http://s.apache.org/MPw
>>
>> Cheers,
>> Chris
>>
>> P.S. The wiki text for the proposal is pasted below.
>>
>> ----------
>> Gora Proposal for Apache Incubation
>>
>> Abstract
>> Gora is an ORM framework for column stores such as Apache HBase and Apache
>> Cassandra with a specific focus on Hadoop.
>>
>> Proposal
>> Although there are various excellent ORM frameworks for relational
>> databases, data modeling in NoSQL data stores differ profoundly from their
>> relational cousins. Moreover, data-model agnostic frameworks such as JDO
>> are
>> not sufficient for use cases, where one needs to use the full power of the
>> data models in column stores. Gora fills this gap by giving the user an
>> easy-to-use ORM framework with data store specific mappings and built in
>> Apache Hadoop support.
>>
>> The overall goal for Gora is to become the standard data representation and
>> persistence framework for big data. The roadmap of Gora can be grouped as
>> follows.
>> * Data Persistence : Persisting objects to Column stores such as HBase,
>> Cassandra, Hypertable; key-value stores such as Voldermort, Redis, etc; SQL
>> databases, such as MySQL, HSQLDB, flat files in local file system of Hadoop
>> HDFS.
>> * Data Access : An easy to use Java-friendly common API for accessing the
>> data regardless of its location.
>> * Indexing : Persisting objects to Lucene and Solr indexes,
>> accessing/querying the data with Gora API.
>> * Analysis : Accesing the data and making analysis through adapters for
>> Apache Pig, Apache Hive and Cascading
>> * MapReduce <http://wiki.apache.org/incubator/MapReduce>  support :
>> Out-of-the-box and extensive MapReduce
>> <http://wiki.apache.org/incubator/MapReduce>  (Apache Hadoop) support for
>> data in the data store.
>>
>> Background
>> ORM stands for Object Relation Mapping. It is a technology which abstacts
>> the persistency layer (mostly Relational Databases) so that plain domain
>> level objects can be used, without the cumbersome effort to save/load the
>> data to and from the database. Gora differs from current solutions in that:
>> * Gora is specially focussed at NoSQL data stores, but also has limited
>> support for SQL databases
>> * The main use case for Gora is to access/analyze big data using Hadoop.
>> * Gora uses Avro for bean definition, not byte code enhancement or
>> annotations
>> * Object-to-data store mappings are backend specific, so that full data
>> model can be utilized.
>> * Gora is simple since it ignores complex SQL mappings
>> * Gora will support persistence, indexing and anaysis of data, using Pig,
>> Lucene, Hive, etc
>> Rationale
>> ORM frameworks are nothing new. But with the explosion of data generated in
>> Terabytes and even Petabytes, NoSQL data stores are gaining ever-increasing
>> popularity. Coupled with limited support to already-proven Apache Hadoop
>> support in current ORM frameworks, there was a need for a new project.
>>
>> Gora is currently hosted at Github. However, Gora has ties to ASF in many
>> ways. As detailed in the proposal section, Gora will be a high level client
>> for many Apache projects and subprojects including Hadoop(common, hdfs, and
>> mapreduce), HBase, Cassandra, Avro, Lucene, Solr, Pig, and Hive. Gora
>> already uses Hadoop, HBase, Cassandra and Avro. Moreover, Gora started its
>> life inside Apache Nutch project, and now Nutch trunk uses Gora as a
>> library. Even more, the initial set of committers are all ASF members.
>> Therefore, we think that Apache will be an excellent home for Gora.
>>
>> Initial Goals
>> Initial goals for Gora can be summarized as:
>> * Iron out the remaining issues with HBase, Cassandra and SQL support.
>> * Make the first release before the end of the year.
>> * Improve documentation
>> * Support for Cascading
>> Current Status
>> Meritocracy
>> Current commit rights belong to the initial list of committers four of who
>> are also ASF members. All the developers have extensive experience with
>> Apache projects. We honor the meritocracy policy of ASF foundation.
>>
>> Community
>> Gora’s community mostly overlap with that of Nutch, Hadoop, HBase, Avro and
>> Cassandra. We  have a small community for now (5 initial committers, 18
>> people tracking the project at Github), but have been piggybacking the
>> Nutch
>> community for a while. If Gora is accepted to Apache Incubator, we expect
>> more traction. Moreover, with the increasing popularity of NoSQL databases,
>> we expect more users.
>>
>> Core Developers
>> Gora was started by the initial code base inside Apache Nutch by Doğacan
>> Güney. Then Enis Söztutar has refactored and re-architected the project out
>> of Nutch. Later Julien Nioche, Andrzej Bialecki and Doğacan has ported
>> Nutch
>> to use the newly formed project. Later, Sertan Alkan has joined. Doğacan
>> and
>> Julien are Nutch PMC members, Andrzej is the Nutch PMC chair. Enis is an
>> Apache Hadoop PMC member.
>>
>> Alignment
>> As discusssed in the second paragraph of Rationale Section, all of the
>> current developers are Apache people, and four of them are PMC members,
>> which shows that we have some experience with the Apache way. Moreover,
>> Gora
>> is tightly related with lots of Apache projects, Nutch, Hadoop, HBase,
>> Cassandra, Avro, Pig, Hive, Lucene to name a few. Gora has started its life
>> inside Nutch, and now nutch trunk uses Gora to persist web crawl data to
>> HBase, Cassandra and MySQL, which means that Gora is a very critical
>> component in Nutch.
>>
>> Known Risks
>> Orphaned Products
>> Most of the development depends on Enis and Doğacan for now. Both of them
>> intent to continue Gora development. However, we also acknowledge that more
>> core developers are needed for the project to be truly successful. The
>> general strategy to acquire more developers will be to acquire more users,
>> and encourage users to be active in the community and develop patches.
>> Moreover, the next release of Nutch planned before the end of 2010 has
>> extensive Gora support. We expect more interest from Nutch community, and
>> we
>> will continue to announce Gora notifications at Hadoop,HBase and Cassandra
>> mailing lists.
>>
>> Inexperience with Open Source
>> We believe that all of the developers have extensive open source
>> experience.
>> Four of the initial committers are apache members. The codebase is also
>> open
>> source since April 2010. We also have some documentation, wiki pages, issue
>> tracker and dev mailing list.
>>
>> Homogeneous Developers
>> We have a semi-distributed development environment where Doğacan, Enis and
>> Sertan share the same office, but Andrzej and Julien are independent. With
>> the aim of acquiring more developers, we expect more heterogeneous
>> development.
>>
>> Reliance on Salaried Developers
>> Gora development have been supported by ant.com
>> <http://wiki.apache.org/incubator/ant.com>  search engine as contract
>> work.
>> It is expected that this contract will continue in the future. However,
>> even
>> without sponsors, we are commited to continue on Gora development, since we
>> believe in the technology it brings and it’s vital role in Nutch, and our
>> other closed sourced projects.
>>
>> Relationships with Other Apache Products
>> Gora will be tightly related to lots of Apache projects:
>> *
>> * Nutch : Apache nutch was to home to Gora’s initial code base. Now, Nutch
>> trunk uses Gora as a library. The next relase of Nutch, planned before the
>> end of 2010 will be using Gora’s first release.
>> *
>> * Hadoop : Gora has extensive support for Hadoop MapReduce
>> <http://wiki.apache.org/incubator/MapReduce>  Gora defines all the
>> necessary
>> data structures for working with Hadoop .Data stored in column oriented
>> data
>> stores can be analyzed  with Gora using Hadoop.
>> *
>> * Avro : Gora uses and extends Avro. Data beans in Gora are defined using
>> Avro schemas ,and compiled into Java code with the extended version of the
>> Avro compiler. Avro is also used in data serialization.
>> *
>> * HBase : Gora supports HBase as a persistency backend.
>> *
>> * Cassandra : Gora support Cassandra as a persistency backend.
>> *
>> * Lucene/Solr : Gora intends to support Lucene/Solr as a persistency and
>> indexing backend.
>> *
>> * Pig : Gora intends to support Pig for data anaysis
>> *
>> * Hive :  Gora intends to support Hive for data analysis
>>
>> An Excessive Fascination with the Apache Brand
>> Gora is a natural fit for Apache due to it's current commiters and
>> depending
>> projects.
>>
>> Documentation
>> * The project is currently hosted at http://github.com/enis/gora/.
>> *
>> * Wiki pages can be found at http://wiki.github.com/enis/gora/.
>> *
>> * List of issues can be found at  http://github.com/enis/gora/issues/.
>> *
>> * Current web address: http://groups.google.com/group/gora-dev.
>> *
>> * Current email address: gora-dev@googlegroups.com.
>>
>> Initial Source
>> The initial source was developed as a patch to the Apache Nutch project.
>> But
>> the storage abstraction layer was orthogonal to the web crawler, and we
>> decided to extract it to a separate project with much wider goals. Thus
>> Gora, as a project, was born. The initial code is developed by Enis and
>> Dogacan with ant.com’s sponsorship.
>>
>> The code can be found at http://github.com/enis/gora/.
>>
>> External Dependencies
>> External dependencies excluding Apache projects are as follows
>> *
>> * JDOM - http://jdom.org/ -  Apache-style license
>> *
>> * SQL Builder - http://openhms.sourceforge.net/sqlbuilder/ - Artistic
>> License, LGPL. SQL Builder is intended to be removed from the source due to
>> technical reasons anyway.
>> *
>> * HSQLDB - http://hsqldb.org/ - BSD-style license
>> *
>> * JUnit - http://junit.org - Common Public License 1.0
>> *
>> * SLF4J - http://www.slf4j.org/ - MIT License
>> *
>> * Google Guava Libraries - http://code.google.com/p/guava-libraries/ -
>> Apache License 2.0
>>
>> Required Resources
>> Mailing Lists
>> * gora-private (with moderated subscriptions)
>> * gora-dev
>> * gora-commits
>> Subversion Directory
>> * http://svn.apache.org/repos/asf/incubator/gora
>>
>> Issue Tracking
>> * JIRA (GORA)
>> Other Resources
>> We need a wiki at http://wiki.apache.org. Currently, we have a wiki at
>> Github, Since there is not a lot of pages there, we can manually move the
>> pages to the wiki at wiki.apache.org.
>>
>> Initial Committers
>> *    Name         email                  Affiliation   Timezone
>> *    Enis Söztutar      enis [at] apache.org         Konneka
>> +3
>> *    Doğacan Güney      dogacan [at] apache.org      Konneka
>> +3
>> *    Sertan Alkan       sertanalkan [at] gmail.com   Konneka
>> +3
>> *    Julien Nioche      jnioche [at] apache.org      DigitalPebble
>> <http://wiki.apache.org/incubator/DigitalPebble>        +1
>> *    Andrzej Bialecki   ab [at] apache.org           Sigram
>> *    Andrew Hart        ahart [at] apache.org        NASA JPL
>>  -8
>> *    Dave Woollard      woollard [at] apache.org     NASA JPL
>>  -8
>> *    Henry Saputra      hsaputra [at] apache.org     Yahoo!
>>  -8
>>
>>  Affiliations
>>  All of the parties are affiliated with companies and organizations that
>> are
>>  familiar with the development of open source . Most of the original Gora
>>  development was sponsored by ant.com, however we expect that the amount
>> of
>>  volunteer work will increase, and more developers will come on board.
>>
>>  Sponsors
>>  Champion
>>  * Chris Mattmann (mattmann AT apache DOT org)
>>  Nominated Mentors
>>  * Chris Mattmann (mattmann AT apache DOT org)
>>  * Andrzej Bialecki (ab AT apache DOT org )
>>  * Tom White (tomwhite AT apache DOT org)
>>  Sponsoring Entity
>>  Apache Incubator. Successful graduation can result in either being a TLP,
>> or a subproject of  Hadoop, since most of the community is projected to
>> overlap.
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: Chris.Mattmann@jpl.nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>



-- 
Thanks
- Mohammad Nour
  Author of (WebSphere Application Server Community Edition 2.0 User Guide)
  http://www.redbooks.ibm.com/abstracts/sg247585.html
- LinkedIn: http://www.linkedin.com/in/mnour
- Blog: http://tadabborat.blogspot.com
----
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

"Writing clean code is what you must do in order to call yourself a
professional. There is no reasonable excuse for doing anything less
than your best."
- Clean Code: A Handbook of Agile Software Craftsmanship

"Stay hungry, stay foolish."
- Steve Jobs

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Mime
View raw message