gora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kam...@apache.org
Subject svn commit: r1710369 - in /gora/site/trunk/content/current: gora-core.md index.md
Date Sat, 24 Oct 2015 18:08:54 GMT
Author: kamaci
Date: Sat Oct 24 18:08:54 2015
New Revision: 1710369

URL: http://svn.apache.org/viewvc?rev=1710369&view=rev
Log:
GoraSparkEngine explanation is added.

Modified:
    gora/site/trunk/content/current/gora-core.md
    gora/site/trunk/content/current/index.md

Modified: gora/site/trunk/content/current/gora-core.md
URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/gora-core.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/gora-core.md (original)
+++ gora/site/trunk/content/current/gora-core.md Sat Oct 24 18:08:54 2015
@@ -9,7 +9,7 @@ Every module
 in gora depends on gora-core therefore most of the generic documentation 
 about the project is gathered here as well as the documentation for <code>AvroStore</code>,

 <code>DataFileAvroStore</code> and <code>MemStore</code>. In addition
to this, gora-core holds all of the 
-core **MapReduce**, **Persistency**, **Query**, **DataStoreBase** and **Utility** functionality.
+core **MapReduce**, **GoraSparkEngine**, **Persistency**, **Query**, **DataStoreBase** and
**Utility** functionality.
 
 [TOC]
 
@@ -122,3 +122,39 @@ MemStore would be configured exactly the
 ##MemStore XML mappings
 In the stores covered within the gora-core module, no physical mappings are required.
 
+#GoraSparkEngine
+##Description
+GoraSparkEngine is Spark backend of Apache Gora. Assume that input and output data stores
are:
+
+    DataStore<K1, V1> inStore;
+    DataStore<K2, V2> outStore;
+
+First step of using GoraSparkEngine is to initialize it:
+
+    GoraSparkEngine<K1, V1> goraSparkEngine = new GoraSparkEngine<>(K1.class,
V1.class);
+
+Construct a `JavaSparkContext`. Register input data store’s value class as Kryo class:
+
+    SparkConf sparkConf = new SparkConf().setAppName("Gora Spark Integration Application").setMaster("local");
+    Class[] c = new Class[1];
+    c[0] = inStore.getPersistentClass();
+    sparkConf.registerKryoClasses(c);
+    JavaSparkContext sc = new JavaSparkContext(sparkConf);
+
+JavaPairRDD can be retrieved from input data store:
+
+    JavaPairRDD<Long, Pageview> goraRDD = goraSparkEngine.initialize(sc, inStore);
+
+After that, all Spark functionality can be applied. For example running count can be done
as follows:
+
+    long count = goraRDD.count();
+
+Map and Reduce functions can be run on a `JavaPairRDD` as well. Assume that this is the variable
after map/reduce is applied:
+
+    JavaPairRDD<String, MetricDatum> mapReducedGoraRdd;
+
+Result can be written as follows:
+
+    Configuration sparkHadoopConf = goraSparkEngine.generateOutputConf(outStore);
+    mapReducedGoraRdd.saveAsNewAPIHadoopDataset(sparkHadoopConf);
+

Modified: gora/site/trunk/content/current/index.md
URL: http://svn.apache.org/viewvc/gora/site/trunk/content/current/index.md?rev=1710369&r1=1710368&r2=1710369&view=diff
==============================================================================
--- gora/site/trunk/content/current/index.md (original)
+++ gora/site/trunk/content/current/index.md Sat Oct 24 18:08:54 2015
@@ -31,7 +31,7 @@ following modules are currently implemen
 * [gora-shims-hadoop-1.x](./gora-shims.html): Module enabling us to use Gora with Hadoop
1.X;
 * [gora-shims-hadoop-2.x](./gora-shims.html): Module enabling us to use Gora with Hadoop
2.X;
 * [gora-shims-hadoop-distribution](./gora-shims.html): Packaging container module enabling
easier dependency management whilst working with Gora Shims;
-* [gora-core](./gora-core.html): Module containing core functionality, AvroStore and DataFileAvroStore
stores;
+* [gora-core](./gora-core.html): Module containing core functionality, AvroStore and DataFileAvroStore
stores, GoraSparkEngine;
 * [gora-accumulo](./gora-accumulo.html): Module for [Apache Accumulo](http://accumulo.apache.org)
backend and AccumuloStore implementation;
 * [camel-gora](./gora-camel.html): An [Apache Camel](http://camel.apache.org/) component
that allows you to work with NoSQL databases using Gora;
 * [gora-cassandra](./gora-cassandra.html): Module for [Apache Cassandra](http://cassandra.apacheorg)
backend and CassandraStore implementation;



Mime
View raw message