gora-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r970127 - in /websites/staging/gora/trunk/content: ./ current/gora-core.html current/tutorial.html
Date Sat, 24 Oct 2015 23:07:37 GMT
Author: buildbot
Date: Sat Oct 24 23:07:36 2015
New Revision: 970127

Log:
Staging update by buildbot for gora

Modified:
    websites/staging/gora/trunk/content/   (props changed)
    websites/staging/gora/trunk/content/current/gora-core.html
    websites/staging/gora/trunk/content/current/tutorial.html

Propchange: websites/staging/gora/trunk/content/
------------------------------------------------------------------------------
--- cms:source-revision (original)
+++ cms:source-revision Sat Oct 24 23:07:36 2015
@@ -1 +1 @@
-1710395
+1710397

Modified: websites/staging/gora/trunk/content/current/gora-core.html
==============================================================================
--- websites/staging/gora/trunk/content/current/gora-core.html (original)
+++ websites/staging/gora/trunk/content/current/gora-core.html Sat Oct 24 23:07:36 2015
@@ -303,7 +303,7 @@ This datastore supports MapReduce.</p>
 <p>In the stores covered within the gora-core module, no physical mappings are required.</p>
 <h1 id="gorasparkengine">GoraSparkEngine<a class="headerlink" href="#gorasparkengine"
title="Permanent link">&para;</a></h1>
 <h2 id="description_3">Description<a class="headerlink" href="#description_3" title="Permanent
link">&para;</a></h2>
-<p>GoraSparkEngine is Spark backend of Apache Gora. Assume that input and output data
stores are:</p>
+<p>GoraSparkEngine is Spark backend of Gora. Assume that input and output data stores
are:</p>
 <div class="codehilite"><pre><span class="n">DataStore</span><span
class="o">&lt;</span><span class="n">K1</span><span class="p">,</span>
<span class="n">V1</span><span class="o">&gt;</span> <span
class="n">inStore</span><span class="p">;</span>
 <span class="n">DataStore</span><span class="o">&lt;</span><span
class="n">K2</span><span class="p">,</span> <span class="n">V2</span><span
class="o">&gt;</span> <span class="n">outStore</span><span class="p">;</span>
 </pre></div>

Modified: websites/staging/gora/trunk/content/current/tutorial.html
==============================================================================
--- websites/staging/gora/trunk/content/current/tutorial.html (original)
+++ websites/staging/gora/trunk/content/current/tutorial.html Sat Oct 24 23:07:36 2015
@@ -229,6 +229,7 @@ MapReduce API in some detail.</p>
 <li><a href="#running-the-job-with-hbase">Running the job with HBase</a></li>
 </ul>
 </li>
+<li><a href="#spark-backend">Spark Backend</a></li>
 <li><a href="#more-examples">More Examples</a></li>
 <li><a href="#feedback">Feedback</a></li>
 </ul>
@@ -1189,6 +1190,136 @@ we can run the job with HBase as:</p>
 </pre></div>
 
 
+<h2 id="spark-backend">Spark Backend<a class="headerlink" href="#spark-backend"
title="Permanent link">&para;</a></h2>
+<p>Log analytics example will be implemented via GoraSparkEngine at this tutorial to
explain Spark backend of Gora.
+Data will be read from Hbase, map/reduce methods will be run and result will be written into
Solr (version: 4.10.3).
+All the process will be done over Spark.</p>
+<p>Persist data into Hbase as described at <a href="/current/tutorial.html#log-analytics-in-mapreduce">Log
analytics in MapReduce</a></p>
+<p>To write result into Solr, create a schemaless core named as Metrics. To do it easily,
you can rename default core of collection1 to Metrics which is at
+<code>solr-4.10.3/example/example-schemaless/solr</code> folder and edit <code>solr-4.10.3/example/example-schemaless/solr/Metrics/core.properties</code>
as follows:</p>
+<div class="codehilite"><pre><span class="n">name</span><span
class="p">=</span><span class="n">Metrics</span>
+</pre></div>
+
+
+<p>Then run start command for Solr:</p>
+<div class="codehilite"><pre><span class="n">solr</span><span
class="o">-</span>4<span class="p">.</span>10<span class="p">.</span>3<span
class="o">/</span><span class="n">example</span>$ <span class="n">java</span>
<span class="o">-</span><span class="n">Dsolr</span><span class="p">.</span><span
class="n">solr</span><span class="p">.</span><span class="n">home</span><span
class="p">=</span><span class="n">example</span><span class="o">-</span><span
class="n">schemaless</span><span class="o">/</span><span class="n">solr</span><span
class="o">/</span> <span class="o">-</span><span class="n">jar</span>
<span class="n">start</span><span class="p">.</span><span class="n">jar</span>
+</pre></div>
+
+
+<p>Read data from Hbase, generate some metrics and write results into Solr with Spark
via Gora. Here is how to initialize in and out data stores:</p>
+<div class="codehilite"><pre><span class="n">public</span> <span
class="n">int</span> <span class="n">run</span><span class="p">(</span><span
class="n">String</span><span class="p">[]</span> <span class="n">args</span><span
class="p">)</span> <span class="n">throws</span> <span class="n">Exception</span>
<span class="p">{</span>
+  <span class="n">DataStore</span><span class="o">&lt;</span><span
class="n">Long</span><span class="p">,</span> <span class="n">Pageview</span><span
class="o">&gt;</span> <span class="n">inStore</span><span class="p">;</span>
+  <span class="n">DataStore</span><span class="o">&lt;</span><span
class="n">String</span><span class="p">,</span> <span class="n">MetricDatum</span><span
class="o">&gt;</span> <span class="n">outStore</span><span class="p">;</span>
+  <span class="n">Configuration</span> <span class="n">hadoopConf</span>
<span class="p">=</span> <span class="n">new</span> <span class="n">Configuration</span><span
class="p">();</span>
+  <span class="k">if</span> <span class="p">(</span><span class="n">args</span><span
class="p">.</span><span class="nb">length</span> <span class="o">&gt;</span>
0<span class="p">)</span> <span class="p">{</span>
+    <span class="n">String</span> <span class="n">dataStoreClass</span>
<span class="p">=</span> <span class="n">args</span><span class="p">[</span>0<span
class="p">];</span>
+    <span class="n">inStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span class="n">getDataStore</span><span
class="p">(</span><span class="n">dataStoreClass</span><span class="p">,</span>
<span class="n">Long</span><span class="p">.</span><span class="n">class</span><span
class="p">,</span> <span class="n">Pageview</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span class="n">hadoopConf</span><span
class="p">);</span>
+    <span class="k">if</span> <span class="p">(</span><span class="n">args</span><span
class="p">.</span><span class="nb">length</span> <span class="o">&gt;</span>
1<span class="p">)</span> <span class="p">{</span>
+      <span class="n">dataStoreClass</span> <span class="p">=</span>
<span class="n">args</span><span class="p">[</span>1<span class="p">];</span>
+    <span class="p">}</span>
+    <span class="n">outStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span class="n">getDataStore</span><span
class="p">(</span><span class="n">dataStoreClass</span><span class="p">,</span>
<span class="n">String</span><span class="p">.</span><span class="n">class</span><span
class="p">,</span> <span class="n">MetricDatum</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span class="n">hadoopConf</span><span
class="p">);</span>
+    <span class="p">}</span> <span class="k">else</span> <span
class="p">{</span>
+      <span class="n">inStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span class="n">getDataStore</span><span
class="p">(</span><span class="n">Long</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span class="n">Pageview</span><span
class="p">.</span><span class="n">class</span><span class="p">,</span>
<span class="n">hadoopConf</span><span class="p">);</span>
+      <span class="n">outStore</span> <span class="p">=</span> <span
class="n">DataStoreFactory</span><span class="p">.</span><span class="n">getDataStore</span><span
class="p">(</span><span class="n">String</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span class="n">MetricDatum</span><span
class="p">.</span><span class="n">class</span><span class="p">,</span>
<span class="n">hadoopConf</span><span class="p">);</span>
+  <span class="p">}</span>
+ <span class="p">...</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p>Pass input data store’s key and value classes and instantiate a GoraSparkEngine:</p>
+<div class="codehilite"><pre><span class="n">GoraSparkEngine</span><span
class="o">&lt;</span><span class="n">Long</span><span class="p">,</span>
<span class="n">Pageview</span><span class="o">&gt;</span> <span
class="n">goraSparkEngine</span> <span class="p">=</span> <span class="n">new</span>
<span class="n">GoraSparkEngine</span><span class="o">&lt;&gt;</span><span
class="p">(</span><span class="n">Long</span><span class="p">.</span><span
class="n">class</span><span class="p">,</span> <span class="n">Pageview</span><span
class="p">.</span><span class="n">class</span><span class="p">);</span>
+</pre></div>
+
+
+<p>Construct a JavaSparkContext. Register input data store’s value class as Kryo
class:</p>
+<div class="codehilite"><pre><span class="n">SparkConf</span> <span
class="n">sparkConf</span> <span class="p">=</span> <span class="n">new</span>
<span class="n">SparkConf</span><span class="p">().</span><span
class="n">setAppName</span><span class="p">(</span>&quot;<span
class="n">Gora</span> <span class="n">Spark</span> <span class="n">Integration</span>
<span class="n">Application</span>&quot;<span class="p">).</span><span
class="n">setMaster</span><span class="p">(</span>&quot;<span
class="n">local</span>&quot;<span class="p">);</span>
+<span class="n">Class</span><span class="p">[]</span> <span class="n">c</span>
<span class="p">=</span> <span class="n">new</span> <span class="n">Class</span><span
class="p">[</span>1<span class="p">];</span>
+<span class="n">c</span><span class="p">[</span>0<span class="p">]</span>
<span class="p">=</span> <span class="n">inStore</span><span class="p">.</span><span
class="n">getPersistentClass</span><span class="p">();</span>
+<span class="n">sparkConf</span><span class="p">.</span><span
class="n">registerKryoClasses</span><span class="p">(</span><span
class="n">c</span><span class="p">);</span>
+<span class="n">JavaSparkContext</span> <span class="n">sc</span>
<span class="p">=</span> <span class="n">new</span> <span class="n">JavaSparkContext</span><span
class="p">(</span><span class="n">sparkConf</span><span class="p">);</span>
+</pre></div>
+
+
+<p>You can get JavaPairRDD from input data store:</p>
+<div class="codehilite"><pre><span class="n">JavaPairRDD</span><span
class="o">&lt;</span><span class="n">Long</span><span class="p">,</span>
<span class="n">Pageview</span><span class="o">&gt;</span> <span
class="n">goraRDD</span> <span class="p">=</span> <span class="n">goraSparkEngine</span><span
class="p">.</span><span class="n">initialize</span><span class="p">(</span><span
class="n">sc</span><span class="p">,</span> <span class="n">inStore</span><span
class="p">);</span>
+</pre></div>
+
+
+<p>When you get it, you can work on it as like you are writing a code for Spark! For
example:</p>
+<div class="codehilite"><pre><span class="n">long</span> <span
class="n">count</span> <span class="p">=</span> <span class="n">goraRDD</span><span
class="p">.</span><span class="n">count</span><span class="p">();</span>
+<span class="n">System</span><span class="p">.</span><span class="n">out</span><span
class="p">.</span><span class="n">println</span><span class="p">(</span>&quot;<span
class="n">Total</span> <span class="n">Log</span> <span class="n">Count</span><span
class="p">:</span> &quot; <span class="o">+</span> <span class="n">count</span><span
class="p">);</span>
+</pre></div>
+
+
+<p>Here are the functions of map and reduce phases for this example:</p>
+<div class="codehilite"><pre><span class="cm">/** The number of milliseconds
in a day */</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="k">final</span> <span class="n">long</span> <span class="no">DAY_MILIS</span>
<span class="o">=</span> <span class="mh">1000</span> <span class="o">*</span>
<span class="mh">60</span> <span class="o">*</span> <span class="mh">60</span>
<span class="o">*</span> <span class="mh">24</span><span class="p">;</span>
+
+<span class="cm">/**</span>
+<span class="cm">* map function used in calculation</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">Function</span><span class="o">&lt;</span><span class="n">Pageview</span><span
class="p">,</span> <span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span>
<span class="n">mapFunc</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Function</span><span class="o">&lt;</span><span
class="n">Pageview</span><span class="p">,</span> <span class="n">Tuple2</span><span
class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">String</span><span class="p">,</span> <span class="n">Long</span><span
class="o">&gt;</span><span class="p">,</span> <span class="n">Long</span><span
class="o">&gt;&gt;</span><span class="p">()
 </span> <span class="p">{</span>
+  <span class="p">@</span><span class="n">Override</span>
+  <span class="n">public</span> <span class="n">Tuple2</span><span
class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">String</span><span class="p">,</span> <span class="n">Long</span><span
class="o">&gt;</span><span class="p">,</span> <span class="n">Long</span><span
class="o">&gt;</span> <span class="n">call</span><span class="p">(</span><span
class="n">Pageview</span> <span class="n">pageview</span><span class="p">)</span>
<span class="n">throws</span> <span class="n">Exception</span> <span
class="p">{</span>
+    <span class="n">String</span> <span class="n">url</span> <span
class="o">=</span> <span class="n">pageview</span><span class="p">.</span><span
class="n">getUrl</span><span class="p">().</span><span class="n">toString</span><span
class="p">();</span>
+    <span class="n">Long</span> <span class="n">day</span> <span
class="o">=</span> <span class="n">getDay</span><span class="p">(</span><span
class="n">pageview</span><span class="p">.</span><span class="n">getTimestamp</span><span
class="p">());</span>
+    <span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">String</span><span class="p">,</span> <span class="n">Long</span><span
class="o">&gt;</span> <span class="n">keyTuple</span> <span class="o">=</span>
<span class="k">new</span> <span class="n">Tuple2</span><span class="o">&lt;&gt;</span><span
class="p">(</span><span class="n">url</span><span class="p">,</span>
<span class="n">day</span><span class="p">);</span>
+    <span class="k">return</span> <span class="k">new</span> <span
class="n">Tuple2</span><span class="o">&lt;&gt;</span><span
class="p">(</span><span class="n">keyTuple</span><span class="p">,</span>
<span class="mh">1</span><span class="no">L</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* reduce function used in calculation</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">Function2</span><span class="o">&lt;</span><span class="n">Long</span><span
class="p">,</span> <span class="n">Long</span><span class="p">,</span>
<span class="n">Long</span><span class="o">&gt;</span> <span
class="n">redFunc</span> <span class="o">=</span> <span class="k">new</span>
<span class="n">Function2</span><span class="o">&lt;</span><span
class="n">Long</span><span class="p">,</span> <span class="n">Long</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span><span
class="p">()</span> <span class="p">{</span>
+  <span class="p">@</span><span class="n">Override</span>
+  <span class="n">public</span> <span class="n">Long</span> <span
class="n">call</span><span class="p">(</span><span class="n">Long</span>
<span class="n">aLong</span><span class="p">,</span> <span class="n">Long</span>
<span class="n">aLong2</span><span class="p">)</span> <span class="n">throws</span>
<span class="n">Exception</span> <span class="p">{</span>
+    <span class="k">return</span> <span class="n">aLong</span> <span
class="o">+</span> <span class="n">aLong2</span><span class="p">;</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* metric function used after map phase</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">PairFunction</span><span class="o">&lt;</span><span
class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Tuple2</span><span
class="o">&lt;</span><span class="n">String</span><span class="p">,</span>
<span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">String</span><span class="p">,</span>
<span class="n">MetricDatum</span><span class="o">&gt;</span>
<span class="n">metricFunc</span> <span class="o">=</span> <span
class="k">new</span> <span class="n">PairFunction</span><span class="o">&lt;</span><span
class="n">Tuple2</span><span class="o">&lt;</span><span class="n">Tuple2</span><span
class="o">&lt;</span><span class="n">String</span><span class="p">,</span>
<span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">Long</span><span class="o
 ">&gt;</span><span class="p">,</span> <span class="n">String</span><span
class="p">,</span> <span class="n">MetricDatum</span><span class="o">&gt;</span><span
class="p">()</span> <span class="p">{</span>
+  <span class="p">@</span><span class="n">Override</span>
+  <span class="n">public</span> <span class="n">Tuple2</span><span
class="o">&lt;</span><span class="n">String</span><span class="p">,</span>
<span class="n">MetricDatum</span><span class="o">&gt;</span>
<span class="n">call</span><span class="p">(</span>
+    <span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span>
<span class="n">tuple2LongTuple2</span><span class="p">)</span> <span
class="n">throws</span> <span class="n">Exception</span> <span class="p">{</span>
+    <span class="n">String</span> <span class="n">dimension</span>
<span class="o">=</span> <span class="n">tuple2LongTuple2</span><span
class="p">.</span><span class="n">_1</span><span class="p">().</span><span
class="n">_1</span><span class="p">();</span>
+    <span class="n">long</span> <span class="n">timestamp</span>
<span class="o">=</span> <span class="n">tuple2LongTuple2</span><span
class="p">.</span><span class="n">_1</span><span class="p">().</span><span
class="n">_2</span><span class="p">();</span>
+    <span class="n">MetricDatum</span> <span class="n">metricDatum</span>
<span class="o">=</span> <span class="k">new</span> <span class="n">MetricDatum</span><span
class="p">();</span>
+    <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setMetricDimension</span><span class="p">(</span><span class="n">dimension</span><span
class="p">);</span>
+    <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setTimestamp</span><span class="p">(</span><span class="n">timestamp</span><span
class="p">);</span>
+    <span class="n">String</span> <span class="n">key</span> <span
class="o">=</span> <span class="n">metricDatum</span><span class="p">.</span><span
class="n">getMetricDimension</span><span class="p">().</span><span
class="n">toString</span><span class="p">();</span>
+    <span class="n">key</span> <span class="o">+=</span> <span
class="s">&quot;_&quot;</span> <span class="o">+</span> <span
class="n">Long</span><span class="p">.</span><span class="n">toString</span><span
class="p">(</span><span class="n">timestamp</span><span class="p">);</span>
+    <span class="n">metricDatum</span><span class="p">.</span><span
class="n">setMetric</span><span class="p">(</span><span class="n">tuple2LongTuple2</span><span
class="p">.</span><span class="n">_2</span><span class="p">());</span>
+    <span class="k">return</span> <span class="k">new</span> <span
class="n">Tuple2</span><span class="o">&lt;&gt;</span><span
class="p">(</span><span class="n">key</span><span class="p">,</span>
<span class="n">metricDatum</span><span class="p">);</span>
+  <span class="p">}</span>
+<span class="p">};</span>
+
+<span class="cm">/**</span>
+<span class="cm">* Rolls up the given timestamp to the day cardinality, so that data
can be aggregated daily</span>
+<span class="cm">*/</span>
+<span class="n">private</span> <span class="k">static</span> <span
class="n">long</span> <span class="n">getDay</span><span class="p">(</span><span
class="n">long</span> <span class="n">timeStamp</span><span class="p">)</span>
<span class="p">{</span>
+  <span class="k">return</span> <span class="p">(</span><span
class="n">timeStamp</span> <span class="o">/</span> <span class="no">DAY_MILIS</span><span
class="p">)</span> <span class="o">*</span> <span class="no">DAY_MILIS</span><span
class="p">;</span>
+<span class="p">}</span>
+</pre></div>
+
+
+<p>Here is how to run map and reduce functions at existing JavaPairRDD:</p>
+<div class="codehilite"><pre><span class="n">JavaRDD</span><span
class="o">&lt;</span><span class="n">Tuple2</span><span class="o">&lt;</span><span
class="n">Tuple2</span><span class="o">&lt;</span><span class="n">String</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;</span><span
class="p">,</span> <span class="n">Long</span><span class="o">&gt;&gt;</span>
<span class="n">mappedGoraRdd</span> <span class="p">=</span> <span
class="n">goraRDD</span><span class="p">.</span><span class="n">values</span><span
class="p">().</span><span class="n">map</span><span class="p">(</span><span
class="n">mapFunc</span><span class="p">);</span>
+<span class="n">JavaPairRDD</span><span class="o">&lt;</span><span
class="n">String</span><span class="p">,</span> <span class="n">MetricDatum</span><span
class="o">&gt;</span> <span class="n">reducedGoraRdd</span> <span
class="p">=</span> <span class="n">JavaPairRDD</span><span class="p">.</span><span
class="n">fromJavaRDD</span><span class="p">(</span><span class="n">mappedGoraRdd</span><span
class="p">).</span><span class="n">reduceByKey</span><span class="p">(</span><span
class="n">redFunc</span><span class="p">).</span><span class="n">mapToPair</span><span
class="p">(</span><span class="n">metricFunc</span><span class="p">);</span>
+</pre></div>
+
+
+<p>When you want to persist result into output data store, (in our example it is Solr),
you should do it as follows:</p>
+<div class="codehilite"><pre><span class="n">Configuration</span>
<span class="n">sparkHadoopConf</span> <span class="p">=</span> <span
class="n">goraSparkEngine</span><span class="p">.</span><span class="n">generateOutputConf</span><span
class="p">(</span><span class="n">outStore</span><span class="p">);</span>
+<span class="n">reducedGoraRdd</span><span class="p">.</span><span
class="n">saveAsNewAPIHadoopDataset</span><span class="p">(</span><span
class="n">sparkHadoopConf</span><span class="p">);</span>
+</pre></div>
+
+
+<p>That’s all! You can check Solr to verify the results.</p>
 <h2 id="more-examples">More Examples<a class="headerlink" href="#more-examples"
title="Permanent link">&para;</a></h2>
 <p>Other than this tutorial, there are several places that you can find 
 examples of Gora in action.</p>



Mime
View raw message