camel-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r975554 - in /websites/production/camel/content: apache-spark.html cache/main.pageCache
Date Mon, 14 Dec 2015 21:19:47 GMT
Author: buildbot
Date: Mon Dec 14 21:19:46 2015
New Revision: 975554

Log:
Production update by buildbot for camel

Modified:
    websites/production/camel/content/apache-spark.html
    websites/production/camel/content/cache/main.pageCache

Modified: websites/production/camel/content/apache-spark.html
==============================================================================
--- websites/production/camel/content/apache-spark.html (original)
+++ websites/production/camel/content/apache-spark.html Mon Dec 14 21:19:46 2015
@@ -84,7 +84,7 @@
 	<tbody>
         <tr>
         <td valign="top" width="100%">
-<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache
Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span
class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div
class="confluence-information-macro-body"><p>&#160;Apache Spark component is
available starting from Camel <strong>2.17</strong>.</p></div></div><p>&#160;</p><p><span
style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a
shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a>
component for the Apache Camel. The main purpose of the Spark integration with Camel is to
provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides
a way to route message from various transports, dynamically choose a task to execute, use
incoming message as input data for that task and finally deliver the results of the execut
 ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported
architectural styles</span></h3><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can be used as a driver application deployed into an application
server (or executed as a fat jar).</span></p><p><span style="line-height:
1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img
class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&amp;modificationDate=1449478362000&amp;api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2"
data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png"
data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png"
data
 -linked-resource-container-id="61331559" data-linked-resource-container-version="14"></span><br
clear="none"></span></p><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250"
src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&amp;modificationDate=1449478393000&amp;api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1"
data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png"
data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png"
data-linked-resource-container-id="61331559" data-linked-
 resource-container-version="14"></span><br clear="none"></span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed
to work as a <em>long running job</em>&#160;serving as an bridge between Spark
cluster and the other endpoints, you can also use it as a <em>fire-once</em> short
job. &#160;</span>&#160;</p><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running
Spark in OSGi servers</span></h3><p>Currently the Spark component doesn't
support execution in the OSGi container. Spark has been designed to be executed as a fat jar,
usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server
is at least challenging and is not support by Camel as well.</p><h3 id="ApacheSpark-URIformat">URI
format</h3><p>Currently the Spark component supports only producers - it it intended
to invoke a Spark job and return results. You can call RDD, data frame or Hive SQL job.</p><div><p>&#160;</p><div
class="code panel pdl" s
 tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width:
1px;"><b>Spark URI format</b></div><div class="codeContent panelContent
pdl">
+<div class="wiki-content maincontent"><h2 id="ApacheSpark-ApacheSparkcomponent">Apache
Spark component</h2><div class="confluence-information-macro confluence-information-macro-information"><span
class="aui-icon aui-icon-small aui-iconfont-info confluence-information-macro-icon"></span><div
class="confluence-information-macro-body"><p>&#160;Apache Spark component is
available starting from Camel <strong>2.17</strong>.</p></div></div><p>&#160;</p><p><span
style="line-height: 1.5625;font-size: 16.0px;">This documentation page covers the <a
shape="rect" class="external-link" href="http://spark.apache.org/">Apache Spark</a>
component for the Apache Camel. The main purpose of the Spark integration with Camel is to
provide a bridge between Camel connectors and Spark tasks. In particular Camel connector provides
a way to route message from various transports, dynamically choose a task to execute, use
incoming message as input data for that task and finally deliver the results of the execut
 ion back to the Camel pipeline.</span></p><h3 id="ApacheSpark-Supportedarchitecturalstyles"><span>Supported
architectural styles</span></h3><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can be used as a driver application deployed into an application
server (or executed as a fat jar).</span></p><p><span style="line-height:
1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper confluence-embedded-manual-size"><img
class="confluence-embedded-image" height="250" src="apache-spark.data/camel_spark_driver.png"
data-image-src="/confluence/download/attachments/61331559/camel_spark_driver.png?version=2&amp;modificationDate=1449478362000&amp;api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331563" data-linked-resource-version="2"
data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_driver.png"
data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png"
data
 -linked-resource-container-id="61331559" data-linked-resource-container-version="15"></span><br
clear="none"></span></p><p><span style="line-height: 1.5625;font-size:
16.0px;">Spark component can also be submitted as a job directly into the Spark cluster.</span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;"><span class="confluence-embedded-file-wrapper
confluence-embedded-manual-size"><img class="confluence-embedded-image" height="250"
src="apache-spark.data/camel_spark_cluster.png" data-image-src="/confluence/download/attachments/61331559/camel_spark_cluster.png?version=1&amp;modificationDate=1449478393000&amp;api=v2"
data-unresolved-comment-count="0" data-linked-resource-id="61331565" data-linked-resource-version="1"
data-linked-resource-type="attachment" data-linked-resource-default-alias="camel_spark_cluster.png"
data-base-url="https://cwiki.apache.org/confluence" data-linked-resource-content-type="image/png"
data-linked-resource-container-id="61331559" data-linked-
 resource-container-version="15"></span><br clear="none"></span></p><p><span
style="line-height: 1.5625;font-size: 16.0px;">While Spark component is primary designed
to work as a <em>long running job</em>&#160;serving as an bridge between Spark
cluster and the other endpoints, you can also use it as a <em>fire-once</em> short
job. &#160;</span>&#160;</p><h3 id="ApacheSpark-RunningSparkinOSGiservers"><span>Running
Spark in OSGi servers</span></h3><p>Currently the Spark component doesn't
support execution in the OSGi container. Spark has been designed to be executed as a fat jar,
usually submitted as a job to a cluster. For those reasons running Spark in an OSGi server
is at least challenging and is not support by Camel as well.</p><h3 id="ApacheSpark-URIformat">URI
format</h3><p>Currently the Spark component supports only producers - it it intended
to invoke a Spark job and return results. You can call RDD, data frame or Hive SQL job.</p><div><p>&#160;</p><div
class="code panel pdl" s
 tyle="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width:
1px;"><b>Spark URI format</b></div><div class="codeContent panelContent
pdl">
 <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:{rdd|dataframe|hive}]]></script>
 </div></div><p>&#160;</p></div><h3 id="ApacheSpark-RDDjobs">RDD
jobs&#160;</h3><p>&#160;</p><div>To invoke an RDD job, use
the following URI:</div><div class="code panel pdl" style="border-width: 1px;"><div
class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD
producer</b></div><div class="codeContent panelContent pdl">
 <script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:rdd?rdd=#testFileRdd&amp;rddCallback=#transformation]]></script>
@@ -204,7 +204,19 @@ DataFrame cars(HiveContext hiveContext)
  	jsonCars.registerTempTable(&quot;cars&quot;);
 	return jsonCars;
 }]]></script>
-</div></div><p>&#160;</p><h4 id="ApacheSpark-DataFramejobsoptions">DataFrame
jobs options</h4><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th
colspan="1" rowspan="1" class="confluenceTh">Option</th><th colspan="1" rowspan="1"
class="confluenceTh">Description</th><th colspan="1" rowspan="1" class="confluenceTh">Default
value</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>dataFrame</code></td><td
colspan="1" rowspan="1" class="confluenceTd">DataFrame instance (subclass of&#160;<code><span>org.apache.spark.</span><span>sql</span>.DataFrame</code>).</td><td
colspan="1" rowspan="1" class="confluenceTd"><code>null</code></td></tr><tr><td
colspan="1" rowspan="1" class="confluenceTd"><code>dataFrameCallback</code></td><td
colspan="1" rowspan="1" class="confluenceTd">Instance of&#160;<code>org.apache.camel.component.spark.DataFrameCallback</code>&#160;interface.</td><td
colspan="1" rowspan="1" class="confluenceTd"><code><span style="color: rgb(0,51,
 102);">null </span></code></td></tr></tbody></table></div><p>&#160;</p><p></p><h3
id="ApacheSpark-SeeAlso">See Also</h3>
+</div></div><p>&#160;</p><h4 id="ApacheSpark-DataFramejobsoptions">DataFrame
jobs options</h4><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th
colspan="1" rowspan="1" class="confluenceTh">Option</th><th colspan="1" rowspan="1"
class="confluenceTh">Description</th><th colspan="1" rowspan="1" class="confluenceTh">Default
value</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>dataFrame</code></td><td
colspan="1" rowspan="1" class="confluenceTd">DataFrame instance (subclass of&#160;<code><span>org.apache.spark.</span><span>sql</span>.DataFrame</code>).</td><td
colspan="1" rowspan="1" class="confluenceTd"><code>null</code></td></tr><tr><td
colspan="1" rowspan="1" class="confluenceTd"><code>dataFrameCallback</code></td><td
colspan="1" rowspan="1" class="confluenceTd">Instance of&#160;<code>org.apache.camel.component.spark.DataFrameCallback</code>&#160;interface.</td><td
colspan="1" rowspan="1" class="confluenceTd"><code><span style="color: rgb(0,51,
 102);">null </span></code></td></tr></tbody></table></div><p>&#160;</p><h3
id="ApacheSpark-Hivejobs">Hive jobs</h3><p>&#160;Instead of working with
RDDs or DataFrame Spark component can also receive Hive SQL queries as payloads.&#160;To
send Hive query to Spark component, use the following URI:</p><div class="code panel
pdl" style="border-width: 1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width:
1px;"><b>Spark RDD producer</b></div><div class="codeContent panelContent
pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[spark:hive]]></script>
+</div></div><p>The following snippet demonstrates how to send message as
an input to a job and return results:</p><div class="code panel pdl" style="border-width:
1px;"><div class="codeHeader panelHeader pdl" style="border-bottom-width: 1px;"><b>Calling
spark job</b></div><div class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[long
carsCount = template.requestBody(&quot;spark:hive?collect=false&quot;, &quot;SELECT
* FROM cars&quot;, Long.class);
+List&lt;Row&gt; cars = template.requestBody(&quot;spark:hive&quot;, &quot;SELECT
* FROM cars&quot;, List.class);]]></script>
+</div></div><p>The table we want to execute query against should be registered
in a HiveContext before we query it. For example in Spring such registration could look as
follows:</p><div class="code panel pdl" style="border-width: 1px;"><div class="codeHeader
panelHeader pdl" style="border-bottom-width: 1px;"><b>Spark RDD definition</b></div><div
class="codeContent panelContent pdl">
+<script class="brush: java; gutter: false; theme: Default" type="syntaxhighlighter"><![CDATA[@Bean
+DataFrame cars(HiveContext hiveContext) {
+  	DataFrame jsonCars = hiveContext.read().json(&quot;/var/data/cars.json&quot;);
+ 	jsonCars.registerTempTable(&quot;cars&quot;);
+	return jsonCars;
+}]]></script>
+</div></div><p>&#160;</p><h4 id="ApacheSpark-Hivejobsoptions">Hive
jobs options</h4><div class="table-wrap"><table class="confluenceTable"><tbody><tr><th
colspan="1" rowspan="1" class="confluenceTh">Option</th><th colspan="1" rowspan="1"
class="confluenceTh">Description</th><th colspan="1" rowspan="1" class="confluenceTh">Default
value</th></tr><tr><td colspan="1" rowspan="1" class="confluenceTd"><code>collect</code></td><td
colspan="1" rowspan="1" class="confluenceTd">Indicates if results should be collected (as
a list of <code>org.apache.spark.sql.Row</code> instances) or if <code>count()</code>
should be called against those.</td><td colspan="1" rowspan="1" class="confluenceTd"><code>true</code></td></tr></tbody></table></div><p>&#160;</p><p></p><h3
id="ApacheSpark-SeeAlso">See Also</h3>
 <ul><li><a shape="rect" href="configuring-camel.html">Configuring Camel</a></li><li><a
shape="rect" href="component.html">Component</a></li><li><a shape="rect"
href="endpoint.html">Endpoint</a></li><li><a shape="rect" href="getting-started.html">Getting
Started</a></li></ul></div>
         </td>
         <td valign="top">

Modified: websites/production/camel/content/cache/main.pageCache
==============================================================================
Binary files - no diff available.



Mime
View raw message