gora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lewi...@apache.org
Subject svn commit: r1458075 - in /gora/cms_site/trunk/content: ./ current/
Date Tue, 19 Mar 2013 01:12:36 GMT
Author: lewismc
Date: Tue Mar 19 01:12:35 2013
New Revision: 1458075

URL: http://svn.apache.org/r1458075


Modified: gora/cms_site/trunk/content/about.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/about.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/about.md (original)
+++ gora/cms_site/trunk/content/about.md Tue Mar 19 01:12:35 2013
@@ -1,6 +1,6 @@
 Title: About Apache Gora™
-Why Gora?
+##Why Gora?
 Although there are various excellent ORM frameworks for relational databases, data modeling in 
 NoSQL data stores differ profoundly from their relational cousins. Moreover, data-model agnostic 
 frameworks such as JDO are not sufficient for use cases, where one needs to use the full power 
@@ -26,7 +26,7 @@ for big data. The roadmap of Gora can be
   support for data in the data store.
 ORM stands for Object Relation Mapping. It is a technology which abstacts the persistency layer 
 (mostly Relational Databases) so that plain domain level objects can be used, without the cumbersome 
 effort to save/load the data to and from the database. Gora differs from current solutions in that:

Modified: gora/cms_site/trunk/content/contribute.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/contribute.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/contribute.md (original)
+++ gora/cms_site/trunk/content/contribute.md Tue Mar 19 01:12:35 2013
@@ -1,6 +1,6 @@
 Title: How to Contribute
-Gora Development Process
+##Gora Development Process
 Gora assumes a development process encouraged by the Apache Software Foundation (ASF). 
 ASF is based on [meritocracy](http://www.apache.org/foundation/how-it-works.html). 
 We encourage open discussion and open development. Nearly everything in Gora is done over 

Modified: gora/cms_site/trunk/content/credits.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/credits.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/credits.md (original)
+++ gora/cms_site/trunk/content/credits.md Tue Mar 19 01:12:35 2013
@@ -1,8 +1,8 @@
 Title: Gora Credits
-A page dedicated to our community past and present!
+##A page dedicated to our community past and present!
 Gora active committers include (ordered by username)
 * Andrzej Bialecki (ab) - Getopt  **CP**
@@ -40,10 +40,9 @@ CH
 :   project champion
 Other Gora contributors and their contributions are listed at Apache 
-##TODO find URL for contributors from Jira.
-How to contribute
+##How to contribute
 There are lots of ways you can contribute to Gora. Make sure you check them all [here](./contribute.html).

Modified: gora/cms_site/trunk/content/current/gora-accumulo.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-accumulo.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-accumulo.md (original)
+++ gora/cms_site/trunk/content/current/gora-accumulo.md Tue Mar 19 01:12:35 2013
@@ -1,10 +1,10 @@
 Title: Gora Accumulo Module
 This is the main documentation for the gora-accumulo module which
 enables [Apache Accumulo](http://accumulo.apache.org) backend support for Gora. 
 * gora.datastore.default=org.apache.gora.accumulo.store.AccumuloStore - Implementation of the storage class 
 * gora.accumulo.mapping.file                                          - The XML mapping file to be used 
 * gora.datastore.accumulo.mock=true                                   - coming soon
@@ -13,5 +13,5 @@ gora.properties 
 * gora.datastore.accumulo.user=root                                   - coming soon
 * gora.datastore.accumulo.password=secret                             - coming soon
-Gora Accumulo mappings 
+##Gora Accumulo mappings 
 * coming soon

Modified: gora/cms_site/trunk/content/current/gora-cassandra.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-cassandra.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-cassandra.md (original)
+++ gora/cms_site/trunk/content/current/gora-cassandra.md Tue Mar 19 01:12:35 2013
@@ -1,18 +1,19 @@
 Title: Gora Cassandra Module
 This is the main documentation for the gora-accumulo module which
 enables [Apache Cassandra](http://cassandra.apache.org) backend support for Gora. 
 * gora.datastore.default=org.apache.gora.cassandra.store.CassandraStore - Implementation of the storage class 
 * gora.cassandra.mapping.file                                           - The XML mapping file to be used 
 * gora.cassandra.servers=localhost:9160                                 - This value should specify the host:port 
     for a running Cassandra server or node. In this case the server happens to be running on localhost at port 9160 
     which is the default Cassandra server configuration.
-Gora Cassandra mappings 
+##Gora Cassandra mappings 
 Say we wished to map some Employee data and store it into the CassandraStore.
       <keyspace name="Employee" host="localhost" cluster="Gora Cassandra Test Cluster">
         <family name="p"/>

Modified: gora/cms_site/trunk/content/current/gora-conf.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-conf.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-conf.md (original)
+++ gora/cms_site/trunk/content/current/gora-conf.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,61 @@
+Title: Gora Configuration
+Gora reads necessary configuration from a properties file name 
+gora.properties. The file is searched in the classpath, which is 
+obtained using the ClassLoader of the DataStoreFactory class.
+The following properties are recognized:
+  <caption>Common Properties</caption>
+  <tr><th align="left">Property</th> <th align="left">Required</th> <th align="left">Default</th> <th align="left">Explanation</th></tr>
+  <tr><td>gora.datastore.default</td><td>No</td> <td> – </td> <td>The full classname of the default data store implementation to use </td></tr>
+  <tr><td>gora.datastore.autocreateschema</td><td>No</td><td>true</td><td>Whether to create schemas automatically</td></tr>
+gora.datastore.default is perhaps the most important property in this file. 
+This property configures the default DataStore implementation to use. 
+However, other data stores can still be instantiated thorough the API. 
+Data store implementation in Gora distribution include:
+  <caption>DataStore implementations</caption>
+  <tr><th align="left">DataStore Implementation</th> <th align="left">Full Class Name</th> <th align="left">Module Name</th> <th align="left">Explanation</th></tr>
+  <tr><td>AvroStore</td> <td>org.apache.gora.avro.store.AvroStore</td> <td>gora-core</td> <td>An adapter DataStore for binary-compatible Apache Avro serializations. AvroDataStore supports Binary and JSON serializations. </td></tr>
+  <tr><td>DataFileAvroStore</td> <td>org.apache.gora.avro.store.DataFileAvroStore</td> <td>gora-core</td> <td>DataFileAvroStore is file based store which uses Avro's DataFile{Writer,Reader}'s as a backend. This datastore supports mapreduce.</td></tr>
+  <tr><td>AccumuloStore</td> <td>org.apache.gora.accumulo.store.AccumuloStore</td> <td>gora-accumulo</td> <td> DataStore for Apache Accumulo. </td></tr>
+  <tr><td>HBaseStore</td> <td>org.apache.gora.hbase.store.HBaseStore</td> <td>gora-hbase</td> <td> DataStore for Apache HBase. </td></tr>
+  <tr><td>CassandraStore</td> <td>org.apache.gora.cassandra.store.CasssandraStore</td> <td>gora-cassandra</td> <td> DataStore for Apache Cassandra. </td></tr>
+  <tr><td>SqlStore</td> <td>org.apache.gora.sql.store.SqlStore</td> <td>gora-sql</td> <td> A DataStore implementation for RDBMS with a SQL interface. SqlStore uses JDBC drivers to communicate with the DB. <a href="http://www.mysql.com/">Mysql</a> and <a href="http://hsqldb.org/">Hsqldb</a> are supported for now.</td></tr>
+  <tr><td>MemStore</td> <td>org.apache.gora.memory.store.MemStore</td> <td>gora-core</td> <td> Memory based DataStore implementation for tests. </td></tr>
+  <tr><td>Dynamodb</td> <td>org.apache.gora.dynamodb.store.DyanmoDBStore</td> <td>gora-dynamodb</td> <td> Webservices-based datastore implementation for Amazon's DynamoDB. </td></tr>
+Some of the properties can be customized per datastore. The format of these 
+properties is as follows: gora.&lt;data_store_class&gt;.&lt;property_name&gt;. 
+Note that &lt;data_store_class&gt; is the classname of the datastore 
+implementation w/o the package name, for example hbasestore. 
+You can also use the string datastore instead of the specific 
+data store class name, in which case, the property setting is effective 
+to all data stores. The following properties can be set per data store.
+##Per DataStore Properties
+Property Required Default Explanation
+gora.&lt;data_store_class&gt;.autocreateschema No true Whether to create schemas automatically for the specific data store
+gora.&lt;data_store_class&gt;.mapping.file No gora-{accumulo|hbase|cassandra|sql|dynamodb}-mapping.xml The name of the mapping file
+##Data store specific settings
+Other than the properties above, some of the data stores have their 
+own configurations. These properties are listed at the module documentations:
+* [Gora Core Module](/gora-core.html) (incl. AvroStore, DataFileAvroStore and MemStore)
+* [Gora HBase Module](/gora-hbase.html)
+* [Gora Cassandra Module](/gora-cassandra.html)
+* [Gora SQL Module](/gora-sql.html)
+* [Gora Accumulo Module](/gora-accumulo.html)
+* [Gora DynamoDB Module](/gora-dynamodb.html)

Modified: gora/cms_site/trunk/content/current/gora-core.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-core.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-core.md (original)
+++ gora/cms_site/trunk/content/current/gora-core.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,26 @@
+Title: Gora Core Module
+This is the main documentation for the gora-core module. gora-core 
+holds most of the core functionality for the gora project. Every module 
+in gora depends on gora-core. Therefore most of the generic documentation 
+about the project is gathered here as well as the documentation for AvroStore, 
+DataFileAvroStore and MemStore. In addition to this gora-core holds all of the 
+core MapReduce, Persistency, Query and Base DataStore and Utility functionality
+which is also documented here.
+To configure the AvroStore one would typically set the following:
+* gora.avrostore.output.path=hdfs://uri/path/to/hdfs/data/directory || file:///uri/path/to/local/data/directory - This value should point to the hdfs data directory (if running Gora in a distributed Hadoop environment) or to some location on the local file system (if running Gora locally). 
+* gora.avrostore.xxx=xxx - xyz
+##To configure the DataFileAvroStore one would typically set the following:
+* gora.datafileavrostore.xxx=xxx - xyz 
+* gora.datafileavrostore.xxx=xxx - xyz
+To configure the MemStore one would typically set the following:
+* gora.memstore.xxx=xxx - xyz
+##Gora Core mappings
+In the stores covered within the gora-core module, no physical mappings are required.

Modified: gora/cms_site/trunk/content/current/gora-dynamodb.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-dynamodb.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-dynamodb.md (original)
+++ gora/cms_site/trunk/content/current/gora-dynamodb.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,12 @@
+Title: Gora DynamoDB Module
+This is the main documentation for the gora-dynamodb module. gora-dynamodb 
+module enables [Amazon DynamoDB](http://aws.amazon.com/dynamodb/) backend support for Gora.
+Coming soon
+##Gora DynamoDB mappings
+Coming soon 

Modified: gora/cms_site/trunk/content/current/gora-hbase.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-hbase.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-hbase.md (original)
+++ gora/cms_site/trunk/content/current/gora-hbase.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,11 @@
+Title: Gora HBase Module
+This is the main documentation for the gora-hbase module. gora-hbase 
+module enables [Apache HBase](http://hbase.apache.org) backend support for Gora. 
+Coming soon 
+##Gora HBase mappings
+Coming soon

Modified: gora/cms_site/trunk/content/current/gora-sql.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/gora-sql.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/gora-sql.md (original)
+++ gora/cms_site/trunk/content/current/gora-sql.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,11 @@
+Title: Gora SQL Module
+This is the main documentation for the gora-sql module. gora-sql 
+module enables SQL backend support for Gora. Currently [MySQL](htp://www.mysql.com) and [HSQLDB](http://www.hsqldb.org) is supported.
+Coming soon
+##Gora SQL mappings
+Coming soon

Modified: gora/cms_site/trunk/content/current/overview.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/overview.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/overview.md (original)
+++ gora/cms_site/trunk/content/current/overview.md Tue Mar 19 01:12:35 2013
@@ -1,6 +1,6 @@
 Title: Gora Module Overview
 This is the main entry point for Gora documentation. Here are some pointers for further info:
 * First if you haven't already done so, make sure to check the [quick start guide](/quickstart.html).
@@ -10,7 +10,7 @@ This is the main entry point for Gora do
 You can find an abstract overview of how to configure Gora [here](./gora-conf.html).
-Gora Modules
+##Gora Modules
 Gora source code is organized in a modular architecture. The gora-core module 
 is the main module which contains the core of the code. All other modules depend 
 on the gora-core module. 
@@ -18,7 +18,7 @@ Each datastore backend in Gora resides i
 the specific module can be found at the module's documentation directory. 
 It is wise so start with going over the documentation for the gora-core 
- module and then the specific data store module(s) you want to use. The 
+module and then the specific data store module(s) you want to use. The 
 following modules are currently implemented in gora.
 * [gora-core](./gora-core.html): Module containing core functionality, AvroStore and DataFileAvroStore stores;

Modified: gora/cms_site/trunk/content/current/quickstart.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/quickstart.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/quickstart.md (original)
+++ gora/cms_site/trunk/content/current/quickstart.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,133 @@
+Title: Quick Start
+This is a quick start guide to help you setup the project.
+First you need to check out the most stable Gora release through the official 
+Apache Gora [release page](../downloads.html).  
+For those who would like to use a development version Gora or simply wish to 
+work with the bleeding edge, instructions for how to check out the source 
+code using svn or git can be found on the [version control](../version_control.html) documentation. 
+Setting up your project 
+More recently Gora began using Maven to manage it's dependencies and build lifecycle. 
+Stable Gora releases are available on the central maven repository or ivy repositories 
+and Gora-SNAPSHOT OSGi bundle artifacts are now pushed to the Apache Nexus 
+<a href="https://repository.apache.org/index.html#nexus-search;quick~gora">here</a>.</p>
+Compiling the project
+If you have the source code for Gora, you can compile the project using
+$ cd gora 
+$ mvn clean compile
+You can also compile individual modules by cd'ing to the module directory and running 
+$ mvn clean compile there.
+If you want to use Gora as a dependency, you can manage it in a few ways. 
+Using ivy to manage gora 
+If your project already uses ivy, then you can include gora dependencies
+to your ivy by adding the following lines to your ivy.xml file: 
+      &lt;dependency org="org.apache.gora" name="gora-core" rev="${version}" conf="*-&gt;compile" changing="true"&gt;
+      &lt;dependency org="org.apache.gora" name="gora-dynamodb" rev="${version}" conf="*-&gt;compile" changing="true"&gt;
+      &lt;dependency org="org.apache.gora" name="gora-hbase" rev="${version}" conf="*-&gt;compile" changing="true"&gt;
+      &lt;dependency org="org.apache.gora" name="gora-cassandra" rev="${version}" conf="*-&gt;compile" changing="true"&gt;
+      &lt;dependency org="org.apache.gora" name="gora-sql" rev="${version}" conf="*-&gt;compile" changing="true"&gt;
+N.B. The ${version} variable should be replaced by the most stable Gora release.
+Only add the modules that you will use, and set the conf to point to the 
+configurations (of your project) that you want to depend on gora. The 
+changing="true" attribute states that, gora artifacts 
+should not be cached, which is required if you want to change gora's 
+source and use the recompiled version.
+Add the following to your ivysettings.xml
+    &lt;resolvers&gt;
+      ...
+      &lt;chain name="internal"&gt;
+        &lt;resolver ref="local"/&gt;
+      &lt;/chain&gt;
+      ...
+    &lt;/resolvers&gt;
+    &lt;modules&gt;
+      ...
+      &lt;module organisation="org.apache.gora" name=".*" resolver="internal"/&gt;
+      ...
+    &lt;/modules&gt;
+This forces gora to be built locally rather than look for it in other repositories.
+Using Maven to manage Gora 
+If your project however uses maven, then you can include gora dependencies
+to your project by adding the following lines to your pom.xml file: 
+	&lt;dependency&gt;
+  		&lt;groupId>org.apache.gora&lt;/groupId&gt;
+  		&lt;artifactId>gora-core&lt;/artifactId&gt;
+  		&lt;version>${version}&lt;/version&gt;
+	&lt;/dependency&gt;
+	&lt;dependency&gt;
+  		&lt;groupId>org.apache.gora&lt;/groupId&gt;
+  		&lt;artifactId>gora-hbase&lt;/artifactId&gt;
+  		&lt;version>${version}&lt;/version&gt;
+	&lt;/dependency&gt;
+	&lt;dependency&gt;
+  		&lt;groupId>org.apache.gora&lt;/groupId&gt;
+  		&lt;artifactId>gora-dynamodb&lt;/artifactId&gt;
+  		&lt;version>${version}&lt;/version&gt;
+	&lt;/dependency&gt;
+	&lt;dependency&gt;
+  		&lt;groupId>org.apache.gora&lt;/groupId&gt;
+  		&lt;artifactId>gora-cassandra&lt;/artifactId&gt;
+  		&lt;version>${version}&lt;/version&gt;
+	&lt;/dependency&gt;
+	&lt;dependency&gt;
+  		&lt;groupId>org.apache.gora&lt;/groupId&gt;
+  		&lt;artifactId>gora-sql&lt;/artifactId&gt;
+  		&lt;version>${version}&lt;/version&gt;
+	&lt;/dependency&gt;
+N.B. The ${version} variable should be replaced by the most stable Gora release.
+Again, only add the modules that you will use.
+Managing gora jars manually
+You can include gora jars manually, if you prefer so. After compiling gora 
+first copy all the jars in gora-[modulename]/lib/ dir. Then 
+copy all the jars in gora-core/lib/ since all of the modules depend 
+on gora-core. Last, copy the actual gora-jars in
+gora-core/build/gora-core-x.×.jar and the jars of all the other 
+modules that you want to use ( for example 
+What's next 
+After setting up gora, you might want to check out the documentation. 
+Most of the current documentation is linked to from the [overview](/overview.html)
+or is available on the [wiki](https://cwiki.apache.org/confluence/display/GORA/Index). 
+Gora Modules
+Gora source code is organized in a modular architecture. The 
+gora-core module is the main module which contains the core of 
+the code. All other modules depend on the gora-core module. Each data 
+store backend in Gora resides in it's own module. The documentation for 
+the specific module can be found at the module's documentation directory. 
+It is wise so start with going over the documentation for the gora-core 
+module and then the specific data store module(s) you want to use. All modules 
+are linked to from the [overview](/overview.html).

Modified: gora/cms_site/trunk/content/current/tutorial.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/current/tutorial.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/current/tutorial.md (original)
+++ gora/cms_site/trunk/content/current/tutorial.md Tue Mar 19 01:12:35 2013
@@ -0,0 +1,995 @@
+Title: Gora Tutorial
+Author : Enis Söztutar, enis [at] apache [dot] org
+This is the official tutorial for Apache Gora. For this tutorial, we 
+will be implementing a system to store our web server logs in Apache HBase,
+and analyze the results using Apache Hadoop and store the results either in HSQLDB or MySQL.
+In this tutorial we will first look at how to set up the environment and 
+configure Gora and the data stores. Later, we will go over the data we will use and
+define the data beans that will be used to interact with the persistency layer. 
+Next, we will go over the API of Gora to do some basic tasks such as storing objects, 
+fetching and querying objects, and deleting objects. Last, we will go over an example 
+program which uses Hadoop MapReduce to analyze the web server logs, and discuss the Gora 
+MapReduce API in some detail.
+##Introduction to Gora
+The Apache Gora open source framework provides an in-memory data 
+model and persistence for big data. Gora supports persisting to 
+column stores, key value stores, document stores and RDBMSs, and 
+analyzing the data with extensive Apache Hadoop MapReduce support. In Avro, the 
+beans to hold the data and RPC interfaces are defined using a JSON 
+schema. In mapping the data beans to data store specific settings, 
+Gora depends on mapping files, which are specific to each data store. 
+Unlike other ORM implementations, Gora the data bean to data store 
+specific schema mapping is explicit. This has the advantage that, 
+when using data models such as HBase and Cassandra, you can always 
+know how the values are persisted.
+Gora has a modular architecture. Most of the data stores in Gora, 
+has it's own module, such as gora-hbase, gora-cassandra,
+and gora-sql. In your projects, you need to only include 
+the artifacts from the modules you use. You can consult the [quick start](/quickstart.html)
+for setting up your project.
+##Setting up Gora
+As a first step, we need to download and compile the Gora source code. The source codes 
+for the tutorial is in the gora-tutorial module. If you have
+already downloaded Gora, that's cool, otherwise, please go
+over the steps at the [quickstart](/quickstart.html) guide for
+how to download and compile Gora.
+Now, after the source code for Gora is at hand, let's have a look at the files under the 
+directory gora-tutorial. 
+    $ cd gora-tutorial
+    $ tree
+    |-- build.xml
+    |-- conf
+    |   |-- gora-hbase-mapping.xml
+    |   |-- gora-sql-mapping.xml
+    |   `-- gora.properties
+    |-- ivy
+    |   `-- ivy.xml
+    `-- src
+        |-- examples
+        |   `-- java
+        |-- main
+        |   |-- avro
+        |   |   |-- metricdatum.json
+        |   |   `-- pageview.json
+        |   |-- java
+        |   |   `-- org
+        |   |       `-- apache
+        |   |           `-- gora
+        |   |               `-- tutorial
+        |   |                   `-- log
+        |   |                       |-- KeyValueWritable.java
+        |   |                       |-- LogAnalytics.java
+        |   |                       |-- LogManager.java
+        |   |                       |-- TextLong.java
+        |   |                       `-- generated
+        |   |                           |-- MetricDatum.java
+        |   |                           `-- Pageview.java
+        |   `-- resources
+        |       `-- access.log.tar.gz
+        `-- test
+            |-- conf
+            `-- java
+Since gora-tutorial is a top level module of Gora, it depends on the directory
+structure imposed by Gora's main build scripts (build.xml and 
+build-common.xml with Ivy and pom.xml for Maven). The Java source code resides in directory 
+src/main/java/, avro schemas in src/main/avro/, and data in src/main/resources/.
+##Setting up HBase
+For this tutorial we will be using HBase to 
+store the logs. For those of you not familiar with HBase, it is a NoSQL
+column store with an architecture very similar to Google's BigTable.
+If you don't already have already HBase setup, you can go over the steps at 
+[HBase Overview](http://hbase.apache.org/book/quickstart.html)
+documentation. Gora aims to support the most recent HBase versions however if you
+find compatability problems please [get in touch](../mailing_lists.html).
+So download an [HBase release](http://hbase.apache.org/releases.html). 
+After extracting the file, cd to the hbase-${dist} directory and start the HBase server. 
+    $ bin/start-hbase.sh
+and make sure that HBase is available by using the Hbase shell. 
+    $ bin/hbase shell
+##Configuring Gora
+Gora is configured through a file in the classpath named gora.properties. 
+We will be using the following file gora-tutorial/conf/gora.properties
+      gora.datastore.default=org.apache.gora.hbase.store.HBaseStore
+      gora.datastore.autocreateschema=true
+This file states that the default store will be HBaseStore,
+and schemas(tables) should be automatically created.
+More information for configuring different settings in gora.properties 
+can be found [here](/gora-conf.html).
+##Modelling the data
+For this tutorial, we will be parsing and storing the logs of a web server. 
+Some example logs are at src/main/resources/access.log.tar.gz, which 
+belongs to the (now shutdown) server at http://www.buldinle.com/. 
+Example logs contain 10,000 lines, between dates 2009/03/10 - 2009/03/15.
+The first thing, we need to do is to extract the logs.
+    $ tar zxvf src/main/resources/access.log.tar.gz -C src/main/resources/
+You can also use your own log files, given that the log 
+format is [Combined Log Format](http://httpd.apache.org/docs/current/logs.html).
+Some example lines from the log are:
+ - - [10/Mar/2009:20:40:26 +0200] "GET / HTTP/1.1" 200 43 "http://www.buldinle.com/" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; GTB5; .NET CLR 2.0.50727; InfoPath.2)
+ - - [11/Mar/2009:00:07:40 +0200] "GET /index.php?i=3&amp;a=1__6x39kovbji8&amp;k=3750105 HTTP/1.1" 200 43 "http://www.buldinle.com/index.php?i=3&amp;a=1__6X39Kovbji8&amp;k=3750105" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; OfficeLiveConnector.1.3; OfficeLivePatch.0.0)
+ - - [12/Mar/2009:18:18:25 +0200] "GET /index.php?a=3__x7l72c&amp;k=4476881 HTTP/1.1" 200 43 "http://www.buldinle.com/index.php?a=3__x7l72c&amp;k=4476881" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; InfoPath.1)
+The first fields in order are: User's ip, ignored, ignored, Date and 
+time, HTTP method, URL, HTTP Method, HTTP status code, Number of bytes 
+returned, Referrer, and User Agent.
+##Defining data beans
+Data beans are the main way to hold the data in memory and persist in Gora. Gora 
+needs to explicitly keep track of the status of the data in memory, so 
+we use [Apache Avro](http://avro.apache.org) for defining the beans. Using 
+Avro gives us the possibility to explicitly keep track object's persistency state, 
+and a way to serialize object's data. 
+Defining data beans is a very easy task, but for the exact syntax, please 
+consult to [Avro Specification](http://avro.apache.org/docs/current/spec.html).
+First, we need to define the bean Pageview to hold a
+single URL access in the logs. Let's go over the class at src/main/avro/pageview.json 
+     {
+      "type": "record",
+      "name": "Pageview",
+      "namespace": "org.apache.gora.tutorial.log.generated",
+      "fields" : [
+        {"name": "url", "type": "string"},
+        {"name": "timestamp", "type": "long"},
+        {"name": "ip", "type": "string"},
+        {"name": "httpMethod", "type": "string"},
+        {"name": "httpStatusCode", "type": "int"},
+        {"name": "responseSize", "type": "int"},
+        {"name": "referrer", "type": "string"},
+        {"name": "userAgent", "type": "string"}
+      ]
+    }
+Avro schemas are declared in JSON. 
+are defined with type "record", with a name as the name of the class, and a 
+namespace which is mapped to the package name in Java. The fields 
+are listed in the "fields" element. Each field is given with its type. 
+##Compiling Avro Schemas
+The next step after defining the data beans is to compile the schemas 
+into Java classes. For that we will use GoraCompiler>. 
+Invoking the Gora compiler by (from Gora top level directory)
+    $ bin/gora compile
+results in:
+    $ Usage: SpecificCompiler &lt;schema file&gt; &lt;output dir&gt;
+so we will issue :
+    $ bin/gora compile gora-tutorial/src/main/avro/pageview.json gora-tutorial/src/main/java/
+to compile the Pageview class into gora-tutorial/src/main/java/org/apache/gora/tutorial/log/generated/Pageview.java. 
+However, the tutorial java classes are already committed, so you do not need to do that now.
+Gora compiler extends Avro's SpecificCompiler to convert JSON definition 
+into a Java class. Generated classes extend the Persistent interface. 
+Most of the methods of the Persistent interface deal with bookkeeping for 
+persistence, and state tracking, so most of the time they are not used explicitly by the
+user. Now, let's look at the internals of the generated class Pageview.java.
+    public class Pageview extends PersistentBase {
+    private Utf8 url;
+    private long timestamp;
+    private Utf8 ip;
+    private Utf8 httpMethod;
+    private int httpStatusCode;
+    private int responseSize;
+    private Utf8 referrer;
+    private Utf8 userAgent;
+    ...
+    public static final Schema _SCHEMA = Schema.parse("{\"type\":\"record\", ... ");
+      public static enum Field {
+      URL(0,"url"),
+      TIMESTAMP(1,"timestamp"),
+      IP(2,"ip"),
+      HTTP_METHOD(3,"httpMethod"),
+      HTTP_STATUS_CODE(4,"httpStatusCode"),
+      RESPONSE_SIZE(5,"responseSize"),
+      REFERRER(6,"referrer"),
+      USER_AGENT(7,"userAgent"),
+      ;
+      private int index;
+      private String name;
+      Field(int index, String name) {this.index=index;this.name=name;}
+      public int getIndex() {return index;}
+      public String getName() {return name;}
+      public String toString() {return name;}
+      };
+    public static final String[] _ALL_FIELDS = {"url","timestamp","ip","httpMethod"
+      ,"httpStatusCode","responseSize","referrer","userAgent",};
+    ...
+    }
+We can see the actual field declarations in the class. Note that Avro uses Utf8 
+class as a placeholder for string fields. We can also see the embedded Avro 
+Schema declaration and an inner enum named Field. This enum and 
+the _ALL_FIELDS field will come in handy when we will use them 
+to query the datastore for specific fields. 
+##Defining data store mappings
+Gora is designed to flexibly work with various types of data modeling, 
+including column stores(such as HBase, Cassandra, etc), SQL databases, flat files(binary, 
+JSON, XML encoded), and key-value stores. The mapping between the data bean and 
+the data store is thus defined in XML mapping files. Each data store has its own 
+mapping format, so that data-store specific settings can be leveraged more easily.
+The mapping files declare how the fields of the classes declared in Avro schemas 
+are serialized and persisted to the data store.
+###HBase mappings
+HBase mappings are stored at file named gora-hbase-mappings.xml. 
+For this tutorial we will be using the file gora-tutorial/conf/gora-hbase-mappings.xml.
+      <!--  This is gora-sql-mapping.xml
+ <source>
+ &lt;gora-orm&gt;
+  &lt;class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog"&gt;
+    &lt;primarykey column="line"/&gt;
+    &lt;field name="url" column="url" length="512" primarykey="true"/&gt;
+    &lt;field name="timestamp" column="timestamp"/&gt;
+    &lt;field name="ip" column="ip" length="16"/&gt;
+    &lt;field name="httpMethod" column="httpMethod" length="6"/&gt;
+    &lt;field name="httpStatusCode" column="httpStatusCode"/&gt;
+    &lt;field name="responseSize" column="responseSize"/&gt;
+    &lt;field name="referrer" column="referrer" length="512"/&gt;
+    &lt;field name="userAgent" column="userAgent" length="512"/&gt;
+  &lt;/class&gt;
+  ...
+      </source>
+      -->
+    &lt;gora-orm&gt;
+      &lt;table name="Pageview"&gt; &lt;!-- optional descriptors for tables --&gt;
+        &lt;family name="common"/&gt; &lt;!-- This can also have params like compression, bloom filters --&gt;
+        &lt;family name="http"/&gt;
+        &lt;family name="misc"/&gt;
+      &lt;/table&gt;
+      &lt;class name="org.apache.gora.tutorial.log.generated.Pageview" keyClass="java.lang.Long" table="AccessLog"&gt;
+        &lt;field name="url" family="common" qualifier="url"/&gt;
+        &lt;field name="timestamp" family="common" qualifier="timestamp"/&gt;
+        &lt;field name="ip" family="common" qualifier="ip" /&gt;
+        &lt;field name="httpMethod" family="http" qualifier="httpMethod"/&gt;
+        &lt;field name="httpStatusCode" family="http" qualifier="httpStatusCode"/&gt;
+        &lt;field name="responseSize" family="http" qualifier="responseSize"/&gt;
+        &lt;field name="referrer" family="misc" qualifier="referrer"/&gt;
+        &lt;field name="userAgent" family="misc" qualifier="userAgent"/&gt;
+      &lt;/class&gt;
+      ...
+    &lt;/gora-orm&gt;  
+Every mapping file starts with the top level element &lt;gora-orm&gt;. 
+Gora HBase mapping files can have two type of child elements, table and 
+class declarations. All of the table and class definitions should be 
+listed at this level.
+table declaration is optional and most of the time, Gora infers the table 
+declaration from the class sub elements. However, some of the HBase 
+specific table configuration such as compression, blockCache, etc can be given here, 
+if Gora is used to auto-create the tables. The exact syntax for the file can be found 
+In Gora, data store access is always 
+done in a key-value data model, since most of the target backends support this model.
+DataStore API expects to know the class names of the key and persistent classes, so that 
+they can be instantiated. The key value pair is declared in the class element.
+The name attribute is the fully qualified name of the class, 
+and the keyClass attribute is the fully qualified class name of the key class.
+Children of the &lt;class&gt; element are &lt;field&gt; 
+elements. Each field element has a name and family attribute, and 
+an optional qualifier attribute. name attribute contains the name 
+of the field in the persistent class, and family declares the column family 
+of the HBase data model. If the qualifier is not given, the name of the field is used 
+as the column qualifier. Note that map and array type fields are stored in unique column 
+families, so the configuration should be list unique column families for each map and 
+array type, and no qualifier should be given. The exact data model is discussed further 
+at the [gora-hbase](/gora-hbase.html) documentation. 
+##Basic API </title>
+###Parsing the logs
+Now that we have the basic setup, we can see Gora API in action. As you can notice below the API 
+is pretty simple to use. We will be using the class LogManager (which is located at
+gora-tutorial/src/main/java/org/apache/gora/tutorial/log/LogManager.java) for parsing 
+and storing the logs, deleting some lines and querying. 
+First of all, let us look at the constructor. The only real thing it does is to call the 
+init() method. init() method constructs the 
+DataStore instance so that it can be used by the LogManager's methods.
+      public LogManager() {
+        try {
+         init();
+        } catch (IOException ex) {
+        throw new RuntimeException(ex);
+        }
+      }
+      private void init() throws IOException {
+        dataStore = DataStoreFactory.getDataStore(Long.class, Pageview.class);
+      }
+DataStore is probably the most important class in the Gora API. 
+DataStore handles actual object persistence. Objects can be persisted, 
+fetched, queried or deleted by the DataStore methods. Every data store that Gora supports, defines its own subclass 
+of the DataStore class. For example gora-hbase module defines HBaseStore, and 
+gora-sql module defines SqlStore. However, these subclasses are not explicitly 
+used by the user.
+DataStores always have associated key and value(persistent) classes. Key class is the class of the keys of the 
+data store, and the value is the actual data bean's class. The value class is almost always generated by 
+Avro schema definitions using the Gora compiler.
+Data store objects are created by DataStoreFactory. It is necessary to 
+provide the key and value class. The datastore class is optional, 
+and if not specified it will be read from the configuration (gora.properties).
+For this tutorial, we have already defined the avro schema to use and compiled
+our data bean into Pageview class. For keys in the data store, we will be using Longs. 
+The keys will hold the line of the pageview in the data file.
+Next, let's look at the main function of the LogManager class.
+    public static void main(String[] args) throws Exception {
+      if(args.length &lg; 2) {
+        System.err.println(USAGE);
+        System.exit(1);
+      }
+      LogManager manager = new LogManager();
+      if("-parse".equals(args[0])) {
+        manager.parse(args[1]);
+      } else if("-query".equals(args[0])) {
+      if(args.length == 2) 
+        manager.query(Long.parseLong(args[1]));
+      else 
+        manager.query(Long.parseLong(args[1]), Long.parseLong(args[2]));
+      } else if("-delete".equals(args[0])) {
+        manager.delete(Long.parseLong(args[1]));
+      } else if("-deleteByQuery".equalsIgnoreCase(args[0])) {
+        manager.deleteByQuery(Long.parseLong(args[1]), Long.parseLong(args[2]));
+      } else {
+        System.err.println(USAGE);
+        System.exit(1);
+      }
+      manager.close();
+    }
+We can use the example log manager program from the command line (in the top level Gora directory): 
+    $ bin/gora logmanager 
+which lists the usage as:
+    LogManager -parse &lt;input_log_file&gt;
+           -get &lt;lineNum&gt;
+           -query &lt;lineNum&gt;
+           -query &lt;startLineNum&gt; &lt;endLineNum&gt;
+           -delete &lt;lineNum&gt;
+           -deleteByQuery &lt;startLineNum&gt; &lt;endLineNum&gt;
+So to parse and store our logs located at gora-tutorial/src/main/resources/access.log, we will issue:
+    $ bin/gora logmanager -parse gora-tutorial/src/main/resources/access.log
+This should output something like:
+    10/09/30 18:30:17 INFO log.LogManager: Parsing file:gora-tutorial/src/main/resources/access.log
+    10/09/30 18:30:23 INFO log.LogManager: finished parsing file. Total number of log lines:10000
+Now, let's look at the code which parses the data and stores the logs.
+    private void parse(String input) throws IOException, ParseException {
+      BufferedReader reader = new BufferedReader(new FileReader(input));
+      long lineCount = 0;
+      try {
+        String line = reader.readLine();
+        do {
+          Pageview pageview = parseLine(line);
+          if(pageview != null) {
+            //store the pageview 
+            storePageview(lineCount++, pageview);
+          }
+          line = reader.readLine();
+        } while(line != null);
+      } finally {
+      reader.close();  
+      }
+    }
+The file is iterated line-by-line. Notice that the parseLine(line)
+function does the actual parsing converting the string to a Pageview object 
+defined earlier.
+    private Pageview parseLine(String line) throws ParseException {
+      StringTokenizer matcher = new StringTokenizer(line);
+      //parse the log line
+      String ip = matcher.nextToken();
+      ...
+      //construct and return pageview object
+      Pageview pageview = new Pageview();
+      pageview.setIp(new Utf8(ip));
+      pageview.setTimestamp(timestamp);
+      ...
+      return pageview;
+    }
+parseLine() uses standard StringTokenizers for the job 
+and constructs and returns a Pageview object.
+###Storing objects in the DataStore
+If we look back at the parse() method above, we can see that the 
+Pageview objects returned by parseLine() are stored via 
+storePageview() method. 
+The storePageview() method is where magic happens, but if we look at the code,
+we can see that it is dead simple.
+    /** Stores the pageview object with the given key */
+    private void storePageview(long key, Pageview pageview) throws IOException {
+      dataStore.put(key, pageview);
+    }
+All we need to do is to call the put() method, which expects a long as key and an instance of Pageview
+as a value.
+###Closing the DataStore
+DataStore implementations can do a lot of caching for performance. 
+However, this means that data is not always flushed to persistent storage all the times. 
+So we need to make sure that upon finishing storing objects, we need to close the datastore 
+instance by calling it's close() method. 
+LogManager always closes it's datastore in it's own close() method.  
+    private void close() throws IOException {
+      //It is very important to close the datastore properly, otherwise
+      //some data loss might occur.
+      if(dataStore != null)
+      dataStore.close();
+    }
+If you are pushing a lot of data, or if you want your data to be accessible before closing 
+the data store, you can also the flush()
+method which, as expected, flushes the data to the underlying data store. However, the actual flush 
+semantics can vary by the data store backend. For example, in SQL flush calls commit()
+on the jdbc Connection object, whereas in Hbase, HTable#flush() is called.
+Also note that even if you call flush() at the end of all data manipulation operations, 
+you still need to call the close() on the datastore.
+##Persisted data in HBase
+Now that we have stored the web access log data in HBase, we can look at
+how the data is stored at HBase. For that, start the HBase shell.
+    $ cd ../hbase-${version}
+    $ bin/hbase shell
+If you have a fresh HBase installation, there should be one table.
+    hbase(main):010:0> list
+    AccessLog                                                                                                     
+    1 row(s) in 0.0470 seconds
+Remember that AccessLog is the name of the table we specified at 
+gora-hbase-mapping.xml. Looking at the contents of the table:
+    hbase(main):010:0> scan 'AccessLog', {LIMIT=>1}
+    ROW                          COLUMN+CELL                                                                      
+     \x00\x00\x00\x00\x00\x00\x0 column=common:ip, timestamp=1285860617341, value=                  
+     0\x00                                                                                                        
+     \x00\x00\x00\x00\x00\x00\x0 column=common:timestamp, timestamp=1285860617341, value=\x00\x00\x01\x1F\xF1\xAEl
+     0\x00                       P                                                                                
+     \x00\x00\x00\x00\x00\x00\x0 column=common:url, timestamp=1285860617341, value=/index.php?a=1__wwv40pdxdpo&amp;k=2
+     0\x00                       18978                                                                            
+     \x00\x00\x00\x00\x00\x00\x0 column=http:httpMethod, timestamp=1285860617341, value=GET                       
+     0\x00                                                                                                        
+     \x00\x00\x00\x00\x00\x00\x0 column=http:httpStatusCode, timestamp=1285860617341, value=\x00\x00\x00\xC8      
+     0\x00                                                                                                        
+     \x00\x00\x00\x00\x00\x00\x0 column=http:responseSize, timestamp=1285860617341, value=\x00\x00\x00+           
+     0\x00                                                                                                        
+     \x00\x00\x00\x00\x00\x00\x0 column=misc:referrer, timestamp=1285860617341, value=http://www.buldinle.com/inde
+     0\x00                       x.php?a=1__WWV40pdxdpo&amp;k=218978                                                  
+     \x00\x00\x00\x00\x00\x00\x0 column=misc:userAgent, timestamp=1285860617341, value=Mozilla/4.0 (compatible; MS
+     0\x00                       IE 6.0; Windows NT 5.1)
+The output shows all the columns matching the first line with key 0. We can see 
+the columns common:ip, common:timestamp, common:url, etc. Remember that 
+these are the columns that we have described in the gora-hbase-mapping.xml file.
+You can also count the number of entries in the table to make sure that all the records
+have been stored.
+    hbase(main):010:0> count 'AccessLog'
+      ... 
+      10000 row(s) in 1.0580 seconds
+##Fetching objects from data store
+Fetching objects from the data store is as easy as storing them. There are essentially 
+two methods for fetching objects. First one is to fetch a single object given it's key. The 
+second method is to run a query through the data store.
+To fetch objects one by one, we can use one of the overloaded 
+get() methods. 
+The method with signature get(K key) returns the object corresponding to the given key fetching all the 
+fields. On the other hand get(K key, String[] fields) returns the object corresponding to the 
+given key, but fetching only the fields given as the second argument.
+When run with the argument -get LogManager class fetches the pageview object 
+from the data store and prints the results.
+    /** Fetches a single pageview object and prints it*/
+    private void get(long key) throws IOException {
+      Pageview pageview = dataStore.get(key);
+      printPageview(pageview);
+    }
+To display the 42nd line of the access log :
+    $ bin/gora logmanager -get 42 
+    org.apache.gora.tutorial.log.generated.Pageview@321ce053 {
+      "url":"/index.php?i=0&amp;a=1__rntjt9z0q9w&amp;k=398179"
+      "timestamp":"1236710649000"
+      "ip":""
+      "httpMethod":"GET"
+      "httpStatusCode":"200"
+      "responseSize":"43"
+      "referrer":"http://www.buldinle.com/index.php?i=0&amp;a=1__RnTjT9z0Q9w&amp;k=398179"
+      "userAgent":"Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)"
+    }
+##Querying objects
+DataStore API defines a Query interface to query the objects at the data store. 
+Each data store implementation can use a specific implementation of the Query interface. Queries are 
+instantiated by calling DataStore#newQuery(). When the query is run through the datastore, the results 
+are returned via the Result interface. Let's see how we can run a query and display the results below in the 
+the LogManager class.
+    /** Queries and prints pageview object that have keys between startKey and endKey*/
+    private void query(long startKey, long endKey) throws IOException {
+      Query&lt;Long, Pageview&gt; query = dataStore.newQuery();
+      //set the properties of query
+      query.setStartKey(startKey);
+      query.setEndKey(endKey);
+      Result&lt;Long, Pageview&gt; result = query.execute();
+      printResult(result);
+    }
+After constructing a Query, its properties 
+are set via the setter methods. Then calling query.execute() returns
+the Result object.
+Result interface allows us to iterate the results one by one by calling the 
+next() method. The getKey() method returns the current key and get()
+returns current persistent object.
+    private void printResult(Result&lt;Long, Pageview&gt; result) throws IOException {
+      while(result.next()) { //advances the Result object and breaks if at end
+        long resultKey = result.getKey(); //obtain current key
+        Pageview resultPageview = result.get(); //obtain current value object
+        //print the results
+        System.out.println(resultKey + ":");
+        printPageview(resultPageview);
+      }
+      System.out.println("Number of pageviews from the query:" + result.getOffset());
+    }
+With these functions defined, we can run the Log Manager class, to query the 
+access logs at HBase. For example, to display the log records between lines 10 and 12 
+we can use:
+    bin/gora logmanager -query 10 12
+Which results in:
+    10:
+    org.apache.gora.tutorial.log.generated.Pageview@d38d0eaa {
+      "url":"/"
+      "timestamp":"1236710442000"
+      "ip":""
+      "httpMethod":"GET"
+      "httpStatusCode":"200"
+      "responseSize":"43"
+      "referrer":"http://buldinle.com/"
+      "userAgent":"Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/2009020911 Ubuntu/8.10 (intrepid) Firefox/3.0.6"
+    }
+    11:
+    org.apache.gora.tutorial.log.generated.Pageview@b513110a {
+      "url":"/index.php?i=7&amp;a=1__gefuumyhl5c&amp;k=5143555"
+      "timestamp":"1236710453000"
+      "ip":""
+      "httpMethod":"GET"
+      "httpStatusCode":"200"
+      "responseSize":"43"
+      "referrer":"http://www.buldinle.com/index.php?i=7&amp;a=1__GeFUuMyHl5c&amp;k=5143555"
+      "userAgent":"Mozilla/5.0 (Windows; U; Windows NT 5.1; tr; rv: Gecko/2009021910 Firefox/3.0.7"
+    }
+##Deleting objects
+Just like fetching objects, there are two main methods to delete 
+objects from the data store. The first one is to delete objects one by 
+one using the DataStore#delete(K) method, which takes the key of the object. 
+Alternatively we can delete all of the data that matches a given query by 
+calling the DataStore#deleteByQuery(Query) method. By using deleteByQuery, we can 
+do fine-grain deletes, for example deleting just a specific field 
+from several records. 
+Continueing from the LogManager class, the api's for both are given below.
+    /**Deletes the pageview with the given line number */
+    private void delete(long lineNum) throws Exception {
+      dataStore.delete(lineNum);
+      dataStore.flush(); //write changes may need to be flushed before they are committed 
+    }
+    /** This method illustrates delete by query call */
+    private void deleteByQuery(long startKey, long endKey) throws IOException {
+      //Constructs a query from the dataStore. The matching rows to this query will be deleted
+      Query&lg;Long, Pageview&gt; query = dataStore.newQuery();
+      //set the properties of query
+      query.setStartKey(startKey);
+      query.setEndKey(endKey);
+      dataStore.deleteByQuery(query);
+    }    
+And from the command line :
+    bin/gora logmanager -delete 12
+    bin/gora logmanager -deleteByQuery 40 50
+##MapReduce Support
+Gora has first class MapReduce support for [Apache Hadoop](http://hadoop.apache.org). 
+Gora data stores can be used as inputs and outputs of jobs. Moreover, the objects can 
+be serialized, and passed between tasks keeping their persistency state. For the 
+serialization, Gora extends Avro DatumWriters.
+###Log analytics in MapReduce
+For this part of the tutorial, we will be analyzing the logs that have been 
+stored at HBase earlier. Specifically, we will develop a MapReduce program to 
+calculate the number of daily pageviews for each URL in the site.
+We will be using the LogAnalytics class to analyze the logs, which can
+be found at gora-tutorial/src/main/java/org/apache/gora/tutorial/log/LogAnalytics.java.
+For computing the analytics, the mapper takes in pageviews, and outputs tuples of 
+&lt;URL, timestamp&gt; pairs, with 1 as the value. The timestamp represents the day 
+in which the pageview occurred, so that the daily pageviews are accumulated. 
+The reducer just sums up the values, and outputs MetricDatum objects 
+to be sent to the output Gora data store.
+###Setting up the environment
+We will be using the logs stored at HBase by the LogManager class. 
+We will push the output of the job to an HSQL database, since it has a zero conf 
+set up. However, you can also use MySQL or HBase for storing the analytics results. 
+If you want to continue with HBase, you can skip the next sections. 
+###Setting up the database
+First we need to download HSQL dependencies. For that, ensure that the hsqldb 
+dependency is available in the Maven pom.xml.
+Ofcourse MySQL users should uncomment the mysql dependency instead. 
+    &lt;!--&lt;dependency org="org.hsqldb" name="hsqldb" rev="2.0.0" conf="*->default"/&gt;--&gt;
+Then we need to run Maven so that the new dependencies can be downloaded.
+    $ mvn 
+If you are using Mysql, you should also setup the database server, create the database 
+and give necessary permissions to create tables, etc so that Gora can run properly.
+###Configuring Gora
+We will put the configuration necessary to connect to the database to 
+####JDBC properties for gora-sql module using HSQL
+####JDBC properties for gora-sql module using MySQL
+As expected the jdbc.driver property is the JDBC driver class,
+and jdbc.url is the JDBC connection URL. Moreover jdbc.user
+and jdbc.password can be specific is needed. More information for these 
+parameters can be found at [gora-sql](/gora-sql.html) documentation. 
+###Modelling the data - Data Beans for Analytics  
+For web site analytics, we will be using a generic MetricDatum
+data structure. It holds a string metricDimension, a long 
+timestamp, and a long metric fields. The first two fields 
+are the dimensions of the web analytics data, and the last is the actual aggregate 
+metric value. For example we might have an instance {metricDimension="/index", 
+timestamp=101, metric=12}, representing that there have been 12 pageviews to 
+the URL "/index" for the given time interval 101.
+The avro schema definition for MetricDatum can be found at 
+gora-tutorial/src/main/avro/metricdatum.json, and the compiled source 
+code at gora-tutorial/src/main/java/org/apache/gora/tutorial/log/generated/MetricDatum.java.
+    {
+      "type": "record",
+      "name": "MetricDatum",
+      "namespace": "org.apache.gora.tutorial.log.generated",
+      "fields" : [
+        {"name": "metricDimension", "type": "string"},
+        {"name": "timestamp", "type": "long"},
+        {"name": "metric", "type" : "long"}
+      ]
+    }
+###Data store mappings
+We will be using the SQL backend to store the job output data, just to 
+demonstrate the SQL backend. 
+Similar to what we have seen with HBase, gora-sql plugin reads configuration from the 
+gora-sql-mappings.xml file. 
+Specifically, we will use the gora-tutorial/conf/gora-sql-mappings.xml file.    
+    &lt;gora-orm&gt;
+      ...
+      &lt;class name="org.apache.gora.tutorial.log.generated.MetricDatum" keyClass="java.lang.String" table="Metrics"&gt;
+        &lt;primarykey column="id" length="512"/&gt;
+        &lt;field name="metricDimension" column="metricDimension" length="512"/&gt;
+        &lt;field name="timestamp" column="ts"/&gt;
+        &lt;field name="metric" column="metric/&gt;
+      &lt;/class&gt;
+    &lt;/gora-orm&gt;
+SQL mapping files contain one or more class elements as the children of gora-orm. 
+The key value pair is declared in the class element. The name attribute is the 
+fully qualified name of the class, and the keyClass attribute is the fully qualified class 
+name of the key class. 
+Children of the class element are field elements and one 
+primaryKey element. Each field element has a name 
+and column attribute, and optional  jdbc-type, length and scale attributes. 
+name attribute contains the name of the field in the persistent class, and 
+column attribute is the name of the 
+column in the database. The primaryKey holds the actual key as the primary key field. Currently, 
+Gora only supports tables with one primary key.
+##Constructing the job 
+In constructing the job object for Hadoop, we need to define whether we will use 
+Gora as job input, output or both. Gora defines 
+its own GoraInputFormat, and GoraOutputFormat, which 
+uses DataStore's as input sources and output sinks for the jobs. 
+Gora{In|Out}putFormat classes define static methods to set up the job properly.
+However, if the mapper or reducer extends Gora's mapper and reducer  classes, 
+you can use the static methods defined in GoraMapper and 
+GoraReducer since they are more convenient. 
+For this tutorial we will use Gora as both input and output. As can be seen from the 
+createJob() function, quoted below, we create the job 
+as normal, and set the input parameters via 
+GoraMapper#initMapperJob(), and GoraReducer#initReducerJob(). 
+GoraMapper#initMapperJob() takes a store and an optional query to fetch the data from. 
+When a query is given, only the results of the query is used as the input of the job, if not all the records are used. 
+The actual Mapper, map output key and value classes are passed to initMapperJob() 
+function as well. GoraReducer#initReducerJob() accepts 
+the data store to store the job's output as well as the actual reducer class.
+initMapperJob and initReducerJob functions have also overriden methods that take the data store class 
+rather than data store instances.
+      public Job createJob(DataStore&lt;Long, Pageview&gt; inStore
+          , DataStore&lt;String, MetricDatum&gt; outStore, int numReducer) throws IOException {
+        Job job = new Job(getConf());
+        job.setJobName("Log Analytics");
+        job.setNumReduceTasks(numReducer);
+        job.setJarByClass(getClass());
+        /* Mappers are initialized with GoraMapper.initMapper() or 
+         * GoraInputFormat.setInput()*/
+        GoraMapper.initMapperJob(job, inStore, TextLong.class, LongWritable.class
+            , LogAnalyticsMapper.class, true);
+        /* Reducers are initialized with GoraReducer#initReducer().
+         * If the output is not to be persisted via Gora, any reducer 
+         * can be used instead. */
+        GoraReducer.initReducerJob(job, outStore, LogAnalyticsReducer.class);
+        return job;
+      }
+###Gora mappers and using Gora an input
+Typically, if Gora is used as job input, the Mapper class extends  
+GoraMapper. However, currently this is not forced by the API so other class hierarchies can be used instead. 
+The mapper receives the key value pairs that are the results of the input query, and emits
+the results of the custom map task. Note that output records from map are independent 
+from the input and output data stores, so any Hadoop serializable key value class can be used. 
+However, Gora persistent classes are also Hadoop serializable. Hadoop serialization is 
+handled by the PersistentSerialization class. Gora also defines a StringSerialization class, to serialize strings easily. 
+Coming back to the code for the tutorial, we can see that LogAnalytics 
+class defines an inner class LogAnalyticsMapper which extends 
+GoraMapper. The map function receives Long keys which are the line 
+numbers, and Pageview values as read from the input data store. The map simply 
+rolls up the timestamp up to the day (meaning that only the day of the timestamp is used), 
+and outputs the key as a tuple of &lt;URL,day&gt;.
+    private TextLong tuple;
+    protected void map(Long key, Pageview pageview, Context context) 
+      throws IOException ,InterruptedException {
+      Utf8 url = pageview.getUrl();
+      long day = getDay(pageview.getTimestamp());
+      tuple.getKey().set(url.toString());
+      tuple.getValue().set(day);
+      context.write(tuple, one);
+    };
+###Gora reducers and using Gora as output
+Similar to the input, typically, if Gora is used as job output, the Reducer extends 
+GoraReducer. The values emitted by the reducer are persisted to the output data store 
+as a result of the job. 
+For this tutorial, the LogAnalyticsReducer inner class, 
+which extends GoraReducer, is used as the reducer. The reducer 
+just sums up all the values that correspond to the &lt;URL,day&gt; tuple. 
+Then the metric dimension object is constructed and emitted, which 
+will be stored at the output data store. 
+    protected void reduce(TextLong tuple
+        , Iterable&lt;LongWritable&gt; values, Context context) 
+      throws IOException ,InterruptedException {
+      long sum = 0L; //sum up the values
+      for(LongWritable value: values) {
+        sum+= value.get();
+      }
+      String dimension = tuple.getKey().toString();
+      long timestamp = tuple.getValue().get();
+      metricDatum.setMetricDimension(new Utf8(dimension));
+      metricDatum.setTimestamp(timestamp);
+      String key = metricDatum.getMetricDimension().toString();
+      metricDatum.setMetric(sum);
+      context.write(key, metricDatum);
+    };
+###Running the job 
+Now that the job is constructed, we can run the Hadoop job as usual. Note that the run function 
+of the LogAnalytics class parses the arguments and runs the job. We can run the program by 
+    $ bin/gora loganalytics [&lt;input data store&gt; [&lt;output data store&gt;]]
+###Running the job with SQL 
+Now, let's run the log analytics tools with the SQL backend(either Hsql or MySql). The input data store will be 
+    org.apache.gora.hbase.store.HBaseStore 
+and output store will be 
+    org.apache.gora.sql.store.SqlStore
+Remember that we have already configured the database 
+connection properties and which database will be used at the Setting up the environment section.
+    $ bin/gora loganalytics org.apache.gora.hbase.store.HBaseStore  org.apache.gora.sql.store.SqlStore
+Now we should see some logging output from the job, and whether it finished with success. To check out the output
+if we are using HSQLDB, below command can be used.
+    $ java -jar gora-tutorial/lib/hsqldb-2.0.0.jar
+In the connection URL, the same URL that we have provided in gora.properties should be used. If on the other hand 
+MySQL is used, than we should be able to see the output using the mysql command line utility. 
+The results of the job are stored at the table Metrics, which is defined at the gora-sql-mapping.xml 
+file. Running a select query over this data confirms that the daily pageview metrics for the web site is indeed stored.
+To see the most popular pages, run:
+&gt; SELECT METRICDIMENSION, TS, METRIC  FROM metrics order by metric desc
+<tr><th>METRICDIMENSION</th> <th>TS</th> <th>METRIC</th></tr>
+<tr><td>/</td> <td>	1236902400000</td> <td>	220</td></tr>
+<tr><td>/</td> <td>	1236988800000</td> <td>	212</td></tr>
+<tr><td>/</td> <td>	1236816000000</td> <td>	191</td></tr>
+<tr><td>/</td> <td>	1237075200000</td> <td>	155</td></tr>
+<tr><td>/</td> <td>	1241395200000</td> <td>	111</td></tr>
+<tr><td>/</td> <td>	1236643200000</td> <td>	110</td></tr>
+<tr><td>/</td> <td>	1236729600000</td> <td>	95</td></tr>
+<tr><td>/index.php?a=3__x8g0vi&amp;k=5508310</td> <td>	1236816000000</td> <td>	45</td></tr>
+<tr><td>/index.php?a=1__5kf9nvgrzos&amp;k=208773</td> <td>	1236816000000</td> <td>	37</td></tr>
+<tr><td>...</td> <td>...</td> <td>...</td></tr>
+      </table>
+As you can see, the home page (/) for varios days and some other pages are listed. 
+In total 3033 rows are present at the metrics table. 
+###Running the job with HBase
+Since HBaseStore is already defined as the default data store at gora.properties
+we can run the job with HBase as:
+    $ bin/gora loganalytics
+The outputs of the job will be saved in the Metrics table, whose layout is defined at 
+gora-hbase-mapping.xml file. To see the results:
+    hbase(main):010:0> scan 'Metrics', {LIMIT=>1}
+    ROW                          COLUMN+CELL
+     /?a=1__-znawtuabsy&amp;k=96804_ column=common:metric, timestamp=1289815441740, value=\x00\x00\x00\x00\x00\x00\x00
+     1236902400000               \x09
+     /?a=1__-znawtuabsy&amp;k=96804_ column=common:metricDimension, timestamp=1289815441740, value=/?a=1__-znawtuabsy&amp;
+     1236902400000               k=96804
+     /?a=1__-znawtuabsy&amp;k=96804_ column=common:ts, timestamp=1289815441740, value=\x00\x00\x01\x1F\xFD \xD0\x00
+     1236902400000
+    1 row(s) in 0.0490 seconds
+##More Examples
+Other than this tutorial, there are several places that you can find 
+examples of Gora in action.
+The first place to look at is the examples directories 
+under various Gora modules. All the modules have a &lt;gora-module&gt;/src/examples/ directory 
+under which some example classes can be found. Especially, there are some classes that are used for tests under 
+Second, various unit tests of Gora modules can be referred to see the API in use. The unit tests can be found 
+at &lt;gora-module&gt;/src/test/
+The source code for the projects using Gora can also be checked out as a reference. [Apache Nutch](http://nutch.apache.org) is 
+one of the first class users of Gora; so looking into how Nutch uses Gora is always a good idea.
+Please feel free to grab our [poweredBy](http://gora.apache.org/images/powered-by-gora.png) sticker and embedded it in anything backed by Apache Gora.
+At last, thanks for trying out Gora. If you find any bugs or you have suggestions for improvement, 
+do not hesitate to give feedback on the dev@gora.apache.org [mailing list](../mailing_lists.html).

Modified: gora/cms_site/trunk/content/downloads.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/downloads.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/downloads.md (original)
+++ gora/cms_site/trunk/content/downloads.md Tue Mar 19 01:12:35 2013
@@ -1,4 +1,4 @@
-Gora Releases 
+Title: Gora Releases 
 Download the newest release of Apache Gora. See the [0.2.1-CHANGES.txt](http://apache.org/dist/gora/0.2.1/CHANGES.txt)
 file for more information on the list of updates in this release.
@@ -9,44 +9,47 @@ The link in the Mirrors column below sho
 based on your inferred location. If you do not see that page, try a different browser. 
 The checksum and signature are links to the originals on the main distribution server.
-Version Checksum Signature
-[Apache Gora 0.2.1 (tar.gz)](http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.tar.gz)
-[Apache Gora 0.2.1 (tar.gz.md5)](http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.tar.gz.md5)
-[Apache Gora 0.2.1 (tar.gz.asc)](http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.tar.gz.asc)
+  <caption>Downloads</caption>
+  <tr><th align="left">Version</th> <th align="left">Mirrors</th> <th align="left">Checksum</th> <th align="left">Signature</th></tr>
+  <tr><td>Apache Gora 0.2.1 (tar.gz)</td><td><a href="http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.tar.gz">
+       apache-gora-0.2.1-src.tar.gz</a></td> <td><a href="http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.tar.gz.md5">
+       apache-gora-0.2.1-src.tar.gz.md5</a> </td> <td><a href="http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.tar.gz.asc">
+       apache-gora-0.2.1-src.tar.gz.asc</a> </td></tr>
+  <tr><td>Apache Gora 0.2.1 (zip)</td><td><a href="http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.zip">
+       apache-gora-0.2.1-src.zip</a></td><td><a href="http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.zip.md5">
+       apache-gora-0.2.1-src.zip.md5</a></td><td><a href="http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.zip.asc">
+       apache-gora-0.2.1-src.zip.asc</a></td></tr>
-[Apache Gora 0.2.1 (zip)](http://www.apache.org/dyn/closer.cgi/gora/0.2.1/apache-gora-0.2.1-src.zip)
-[Apache Gora 0.2.1 (zip.md5)](http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.zip.md5)
-[Apache Gora 0.2.1 (zip.asc)](http://www.apache.org/dist/gora/0.2.1/apache-gora-0.2.1-src.zip.asc)
-Verify Releases
+##Verify Releases
 It is essential that you verify the integrity of the downloaded files using the PGP or MD5 signatures. 
 Please read [Verifying Apache HTTP Server Releases](http://httpd.apache.org/dev/verification.html) 
 for more information on why you should verify our releases.
 We strongly recommend you verify your downloads with both PGP and MD5.
-PGP Signature
+##PGP Signature
 The PGP signatures can be verified using PGP or GPG. First download the 
 [KEYS](http://www.apache.org/dist/gora/KEYS) as well as the asc signature file 
 for the relevant distribution. Make sure you get these files from the 
 [main distribution directory](http://www.apache.org/dist/gora/), rather than from a 
 mirror. Then verify the signatures using the following
-$ gpg --import KEYS
-$ gpg --verify apache-gora-X.Y.Z-src.tar.gz.asc
+    $ gpg --import KEYS
+    $ gpg --verify apache-gora-X.Y.Z-src.tar.gz.asc
 The files in the most recent release are signed by Lewis John McGibbney (lewismc) C601BCA7
-MD5 Signature
+##MD5 Signature
 Alternatively, you can verify the MD5 signature on the files. 
 A unix program called md5 or md5sum is included in many unix distributions.
-$ md5sum apache-gora-X.Y.Z-src.tar.gz.asc
+    $ md5sum apache-gora-X.Y.Z-src.tar.gz.asc
 ... output should match the string in apache-gora-X.Y.Z.tar.gz.md5
-Previous Releases
+##Previous Releases
 If you are looking for previous releases of Apache Gora, have a look in the 
 [Apache Archives](http://archive.apache.org/dist/gora/), or alternatively 
 for even older releases check out the [Incubator archives](http://archive.apache.org/dist/incubator/gora/).

Modified: gora/cms_site/trunk/content/index.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/index.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/index.md (original)
+++ gora/cms_site/trunk/content/index.md Tue Mar 19 01:12:35 2013
@@ -1,6 +1,6 @@
 Title: Welcome to Apache Gora&trade;
-Welcome to the Apache Gora project!
+#Welcome to the Apache Gora project!
 What is Apache Gora?
 The Apache Gora open source framework provides an in-memory data model and persistence 

Modified: gora/cms_site/trunk/content/mailing_lists.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/mailing_lists.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/mailing_lists.md (original)
+++ gora/cms_site/trunk/content/mailing_lists.md Tue Mar 19 01:12:35 2013
@@ -1,6 +1,6 @@
 Title: Gora Mailing Lists
 If you use Gora, please subscribe to the Gora user mailing list.
 The Gora user mailing list is :
@@ -10,7 +10,7 @@ The Gora user mailing list is :
 * [View List Archive](http://mail-archives.apache.org/mod_mbox/gora-user/)
 In order to post to the list, it is necessary to first subscribe to it.
 If you'd like to contribute to Gora, please subscribe to the
 Gora developer mailing list.
 The Gora developer mailing list is :
@@ -21,7 +21,7 @@ The Gora developer mailing list is :
 * [View List Archive](http://mail-archives.apache.org/mod_mbox/gora-dev/)
 In order to post to the list, it is necessary to first subscribe to it.
+##Developers and Committers
 If you'd like to see changes made in Gora's [version control system](/version_control.html)
 then please subscribe to the Gora commit's mailing list.
 The Gora commit's mailing list is :

Modified: gora/cms_site/trunk/content/nightly_builds.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/nightly_builds.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/nightly_builds.md (original)
+++ gora/cms_site/trunk/content/nightly_builds.md Tue Mar 19 01:12:35 2013
@@ -1,4 +1,5 @@
 Title: Automated Nightly Builds
 We're using the Apache Jenkins build server for continuous builds.
 [Trunk build](http://builds.apache.org/job/Gora-trunk/)

Modified: gora/cms_site/trunk/content/version_control.md
URL: http://svn.apache.org/viewvc/gora/cms_site/trunk/content/version_control.md?rev=1458075&r1=1458074&r2=1458075&view=diff
--- gora/cms_site/trunk/content/version_control.md (original)
+++ gora/cms_site/trunk/content/version_control.md Tue Mar 19 01:12:35 2013
@@ -1,42 +1,45 @@
 Title: Gora Version Control System
 Gora uses the official subversion repository of the Apache Software Foundation.
 However, Apache also provides read only mirrors for git users. Below you can find 
 how to use subversion or git to access Gora's source code.
-Subversion Repository
+#Subversion Repository
-Subversion Clients
+##Subversion Clients
 The Gora source code resides in the [Apache Subversion](http://subversion.tigris.org/)
 (SVN) repository. The command-line SVN client can be obtained [here](http://subversion.tigris.org/project_packages.html).
 The TortoiseSVN GUI client for Windows can be obtained [here](http://tortoisesvn.tigris.org/). 
 There are also SVN plugins available for both [Eclipse](http://subclipse.tigris.org/) and
 [IntelliJ IDEA](http://svnup.tigris.org/).
-Web Access (read-only)
+##Web Access (read-only)
 The source code can be browsed via the Web at 
 No SVN client software is required.
-Anonymous Access (read-only)
+##Anonymous Access (read-only)
 The SVN URL for anonymous users is [http://svn.apache.org/repos/asf/gora/trunk](http://svn.apache.org/repos/asf/gora/trunk).
 Instructions for anonymous SVN access are [here](http://www.apache.org/dev/version-control.html#anon-svn).
-Committer Access (read-write)
+##Committer Access (read-write)
 The SVN URL for committers is [https://svn.apache.org/repos/asf/gora/trunk](https://svn.apache.org/repos/asf/gora/trunk).
 Instructions for committer SVN access are [here](http://www.apache.org/dev/version-control.html#https-svn).
-Git Repository
+#Git Repository
-Anonymous Access (read-only)
+##Anonymous Access (read-only)
 The apache git repository can be used for accessing the repository.
 The URL for anonymous read-only access is [http://git.apache.org/gora.git/](http://git.apache.org/gora.git/). 
 Alternatively the Github mirror at [http://github.com/apache/gora](http://github.com/apache/gora) can also be used. 
 The repository can be cloned by:
-$ git clone http://git.apache.org/gora.git/ 
+    $ git clone http://git.apache.org/gora.git/ 
 More instructions for setting up git access can be found [here](http://www.apache.org/dev/git.html).
-Committer Access (read-write)
+##Committer Access (read-write)
 Currently the subversion repository is the only official repository 
 that has Gora write access. However, committers can use the git-svn package
 to bypass svn and use a git-only workflow. Setting up the initial repository

View raw message