accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mwa...@apache.org
Subject [accumulo-website] branch asf-site updated: Jekyll build from master:d70ec3b
Date Mon, 07 Jan 2019 19:27:25 GMT
This is an automated email from the ASF dual-hosted git repository.

mwalch pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/accumulo-website.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 9ea764e  Jekyll build from master:d70ec3b
9ea764e is described below

commit 9ea764e504da21fe3c82714941a64dfaae89702d
Author: Mike Walch <mwalch@apache.org>
AuthorDate: Mon Jan 7 14:27:01 2019 -0500

    Jekyll build from master:d70ec3b
    
    More updates to MapReduce docs (#142)
---
 docs/2.x/administration/upgrading.html | 11 ++++-
 docs/2.x/development/mapreduce.html    | 79 +++++++++++++++++++++++++++++++---
 feed.xml                               |  4 +-
 search_data.json                       |  4 +-
 4 files changed, 86 insertions(+), 12 deletions(-)

diff --git a/docs/2.x/administration/upgrading.html b/docs/2.x/administration/upgrading.html
index c265da8..72f042e 100644
--- a/docs/2.x/administration/upgrading.html
+++ b/docs/2.x/administration/upgrading.html
@@ -479,7 +479,7 @@ distributions of Hadoop.</li>
       <li><code class="highlighter-rouge">log4j.properties</code> for Accumulo
clients and commands</li>
     </ul>
   </li>
-  <li><a href="/docs/2.x/development/mapreduce#configuration">New Hadoop configuration
is required</a> when reading or writing to Accumulo using MapReduce.</li>
+  <li>MapReduce jobs that read/write from Accumulo <a href="/docs/2.x/development/mapreduce#configure-dependencies-for-your-mapreduce-job">must
configure their dependencies differently</a>.</li>
   <li>Run the command <code class="highlighter-rouge">accumulo shell</code>
to access the shell using configuration in <code class="highlighter-rouge">conf/accumulo-client.properties</code></li>
 </ul>
 
@@ -524,6 +524,15 @@ that users start using the new API, the old API will continue to be supported
th
       <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/2.0.0-alpha-1/org/apache/accumulo/core/client/Connector.html">Connector</a>
objects can be created from an <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/2.0.0-alpha-1/org/apache/accumulo/core/client/AccumuloClient.html">AccumuloClient</a>
object using <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-core/2.0.0-alpha-1/org/apache/accumulo/core/client/Connector.html#from-org
[...]
     </ul>
   </li>
+  <li>Accumulo’s <a href="/docs/2.x/development/mapreduce">MapReduce API</a>
has changed in 2.0.
+    <ul>
+      <li>A new API has been introduced in the <code class="highlighter-rouge">org.apache.accumulo.hadoop</code>
package of the <code class="highlighter-rouge">accumulo-hadoop-mapreduce</code>
jar.</li>
+      <li>The old API in the <code class="highlighter-rouge">org.apache.accumulo.core.client</code>
package of the <code class="highlighter-rouge">accumulo-core</code> has been deprecated
and will
+eventually be removed.</li>
+      <li>For both the old and new API, you must <a href="/docs/2.x/development/mapreduce#configure-dependencies-for-your-mapreduce-job">configure
dependencies differently</a>
+when creating your MapReduce job.</li>
+    </ul>
+  </li>
 </ul>
 
 <h2 id="upgrading-from-17-to-18">Upgrading from 1.7 to 1.8</h2>
diff --git a/docs/2.x/development/mapreduce.html b/docs/2.x/development/mapreduce.html
index 6b349a1..fc1e310 100644
--- a/docs/2.x/development/mapreduce.html
+++ b/docs/2.x/development/mapreduce.html
@@ -432,10 +432,49 @@
 
 <h2 id="general-mapreduce-configuration">General MapReduce configuration</h2>
 
-<p>Since 2.0.0, Accumulo no longer has the same dependency versions (i.e Guava, etc)
as Hadoop.
-When launching a MapReduce job that reads or writes to Accumulo, you should build a shaded
jar
-with all of your dependencies and complete the following steps so YARN only includes Hadoop
code
-(and not all of Hadoop dependencies) when running your MapReduce job:</p>
+<h3 id="add-accumulos-mapreduce-api-to-your-dependencies">Add Accumulo’s MapReduce
API to your dependencies</h3>
+
+<p>If you are using Maven, add the following dependency to your <code class="highlighter-rouge">pom.xml</code>
to use Accumulo’s MapReduce API:</p>
+
+<div class="language-xml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span
class="nt">&lt;dependency&gt;</span>
+  <span class="nt">&lt;groupId&gt;</span>org.apache.accumulo<span
class="nt">&lt;/groupId&gt;</span>
+  <span class="nt">&lt;artifactId&gt;</span>accumulo-hadoop-mapreduce<span
class="nt">&lt;/artifactId&gt;</span>
+  <span class="nt">&lt;version&gt;</span>2.0.0-alpha-1<span class="nt">&lt;/version&gt;</span>
+<span class="nt">&lt;/dependency&gt;</span>
+</code></pre></div></div>
+
+<p>The MapReduce API consists of the following classes:</p>
+
+<ul>
+  <li>If using Hadoop’s <strong>mapreduce</strong> API:
+    <ul>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html">org.apache.accumulo.hadoop.mapreduce.AccumuloInputFormat</a></li>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloOutputFormat.html">org.apache.accumulo.hadoop.mapreduce.AccumuloOutputFormat</a></li>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloFileOutputFormat.html">org.apache.accumulo.hadoop.mapreduce.AccumuloFileOutputFormat</a></li>
+    </ul>
+  </li>
+  <li>If using Hadoop’s <strong>mapred</strong> API:
+    <ul>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapred/AccumuloInputFormat.html">org.apache.accumulo.hadoop.mapred.AccumuloInputFormat</a></li>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapred/AccumuloOutputFormat.html">org.apache.accumulo.hadoop.mapred.AccumuloOutputFormat</a></li>
+      <li><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapred/AccumuloFileOutputFormat.html">org.apache.accumulo.hadoop.mapred.AccumuloFileOutputFormat</a></li>
+    </ul>
+  </li>
+</ul>
+
+<p>Before 2.0, the MapReduce API resided in the <code class="highlighter-rouge">org.apache.accumulo.core.client</code>
package of the <code class="highlighter-rouge">accumulo-core</code> jar.
+While this old API still exists and can be used, it has been deprecated and will be removed
eventually.</p>
+
+<h3 id="configure-dependencies-for-your-mapreduce-job">Configure dependencies for your
MapReduce job</h3>
+
+<p>Before 2.0, Accumulo used the same versions for dependencies (such as Guava) as
Hadoop. This allowed
+MapReduce jobs to run with both Accumulo’s &amp; Hadoop’s dependencies on the classpath.</p>
+
+<p>Since 2.0, Accumulo no longer has the same versions for dependencies as Hadoop.
While this allows
+Accumulo to update its dependencies more frequently, it can cause problems if both Accumulo’s
&amp;
+Hadoop’s dependencies are on the classpath of the MapReduce job. When launching a MapReduce
job that
+use Accumulo, you should build a shaded jar with all of your dependencies and complete the
following
+steps so YARN only includes Hadoop code (and not all of Hadoop’s dependencies) when running
your MapReduce job:</p>
 
 <ol>
   <li>
@@ -467,7 +506,7 @@ your job with <code class="highlighter-rouge">yarn</code>
command.</p>
   <li>
     <p>Configure your MapReduce job to use <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html">AccumuloInputFormat</a>.</p>
 
-    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">(</span><span class="n">getConf</span><span
class="o">());</span>
+    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">();</span>
  <span class="n">job</span><span class="o">.</span><span class="na">setInputFormatClass</span><span
class="o">(</span><span class="n">AccumuloInputFormat</span><span
class="o">.</span><span class="na">class</span><span class="o">);</span>
  <span class="n">Properties</span> <span class="n">props</span> <span
class="o">=</span> <span class="n">Accumulo</span><span class="o">.</span><span
class="na">newClientProperties</span><span class="o">().</span><span
class="na">to</span><span class="o">(</span><span class="s">"myinstance"</span><span
class="o">,</span><span class="s">"zoo1,zoo2"</span><span class="o">)</span>
                          <span class="o">.</span><span class="na">as</span><span
class="o">(</span><span class="s">"user"</span><span class="o">,</span>
<span class="s">"passwd"</span><span class="o">).</span><span class="na">build</span><span
class="o">();</span>
@@ -488,7 +527,7 @@ your job with <code class="highlighter-rouge">yarn</code>
command.</p>
      <span class="o">.</span><span class="na">store</span><span
class="o">(</span><span class="n">job</span><span class="o">);</span>
 </code></pre></div>    </div>
     <p><a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloInputFormat.html">AccumuloInputFormat</a>
can also be configured to read from multiple Accumulo tables.</p>
-    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">(</span><span class="n">getConf</span><span
class="o">());</span>
+    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">();</span>
  <span class="n">job</span><span class="o">.</span><span class="na">setInputFormatClass</span><span
class="o">(</span><span class="n">AccumuloInputFormat</span><span
class="o">.</span><span class="na">class</span><span class="o">);</span>
  <span class="n">Properties</span> <span class="n">props</span> <span
class="o">=</span> <span class="n">Accumulo</span><span class="o">.</span><span
class="na">newClientProperties</span><span class="o">().</span><span
class="na">to</span><span class="o">(</span><span class="s">"myinstance"</span><span
class="o">,</span><span class="s">"zoo1,zoo2"</span><span class="o">)</span>
                          <span class="o">.</span><span class="na">as</span><span
class="o">(</span><span class="s">"user"</span><span class="o">,</span>
<span class="s">"passwd"</span><span class="o">).</span><span class="na">build</span><span
class="o">();</span>
@@ -533,7 +572,7 @@ your job with <code class="highlighter-rouge">yarn</code>
command.</p>
  options.</p>
   </li>
   <li>Configure your MapReduce job to use <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloOutputFormat.html">AccumuloOutputFormat</a>.
-    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">(</span><span class="n">getConf</span><span
class="o">());</span>
+    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">();</span>
  <span class="n">job</span><span class="o">.</span><span class="na">setOutputFormatClass</span><span
class="o">(</span><span class="n">AccumuloOutputFormat</span><span
class="o">.</span><span class="na">class</span><span class="o">);</span>
  <span class="n">Properties</span> <span class="n">props</span> <span
class="o">=</span> <span class="n">Accumulo</span><span class="o">.</span><span
class="na">newClientProperties</span><span class="o">().</span><span
class="na">to</span><span class="o">(</span><span class="s">"myinstance"</span><span
class="o">,</span><span class="s">"zoo1,zoo2"</span><span class="o">)</span>
                          <span class="o">.</span><span class="na">as</span><span
class="o">(</span><span class="s">"user"</span><span class="o">,</span>
<span class="s">"passwd"</span><span class="o">).</span><span class="na">build</span><span
class="o">();</span>
@@ -543,6 +582,32 @@ your job with <code class="highlighter-rouge">yarn</code>
command.</p>
   </li>
 </ol>
 
+<h2 id="write-output-to-rfiles-in-hdfs">Write output to RFiles in HDFS</h2>
+
+<p>Follow the step below to have a MapReduce job output to RFiles in HDFS. These files
+can then be bulk imported into Accumulo:</p>
+
+<ol>
+  <li>Create a Mapper or Reducer with <code class="highlighter-rouge">Key</code>
&amp; <code class="highlighter-rouge">Value</code> as output parameters.
+    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="kd">class</span> <span class="nc">MyReducer</span>
<span class="kd">extends</span> <span class="n">Reducer</span><span
class="o">&lt;</span><span class="n">WritableComparable</span><span
class="o">,</span> <span class="n">Writable</span><span class="o">,</span>
<span class="n">Key</span><span class="o">,</span> <span class="n">Value</span><span
class="o">&gt;</span> <span cl [...]
+     <span class="kd">public</span> <span class="kt">void</span>
<span class="nf">reduce</span><span class="o">(</span><span class="n">WritableComparable</span>
<span class="n">key</span><span class="o">,</span> <span class="n">Iterable</span><span
class="o">&lt;</span><span class="n">Text</span><span class="o">&gt;</span>
<span class="n">values</span><span class="o">,</span> <span class="n">Context</span>
<span class="n">c</span><span class="o">)</span> <span class="o">{</span>
+         <span class="n">Key</span> <span class="n">key</span><span
class="o">;</span>
+         <span class="n">Value</span> <span class="n">value</span><span
class="o">;</span>
+         <span class="c1">// create Key &amp; Value based on input</span>
+         <span class="n">c</span><span class="o">.</span><span
class="na">write</span><span class="o">(</span><span class="n">key</span><span
class="o">,</span> <span class="n">value</span><span class="o">);</span>
+     <span class="o">}</span>
+ <span class="o">}</span>
+</code></pre></div>    </div>
+  </li>
+  <li>Configure your MapReduce job to use <a href="https://static.javadoc.io/org.apache.accumulo/accumulo-hadoop-mapreduce/2.0.0-alpha-1/org/apache/accumulo/hadoop/mapreduce/AccumuloFileOutputFormat.html">AccumuloFileOutputFormat</a>.
+    <div class="language-java highlighter-rouge"><div class="highlight"><pre
class="highlight"><code> <span class="n">Job</span> <span class="n">job</span>
<span class="o">=</span> <span class="n">Job</span><span class="o">.</span><span
class="na">getInstance</span><span class="o">();</span>
+ <span class="n">job</span><span class="o">.</span><span class="na">setOutputFormatClass</span><span
class="o">(</span><span class="n">AccumuloFileOutputFormat</span><span
class="o">.</span><span class="na">class</span><span class="o">);</span>
+ <span class="n">AccumuloFileOutputFormat</span><span class="o">.</span><span
class="na">configure</span><span class="o">()</span>
+     <span class="o">.</span><span class="na">outputPath</span><span
class="o">(</span><span class="k">new</span> <span class="n">Path</span><span
class="o">(</span><span class="s">"hdfs://localhost:8020/myoutput/"</span><span
class="o">)).</span><span class="na">store</span><span class="o">(</span><span
class="n">job</span><span class="o">);</span>
+</code></pre></div>    </div>
+  </li>
+</ol>
+
 <p>The <a href="https://github.com/apache/accumulo-examples/blob/master/docs/mapred.md">MapReduce
example</a> contains a complete example of using MapReduce with Accumulo.</p>
 
 
diff --git a/feed.xml b/feed.xml
index 6530228..a7bcc03 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
 </description>
     <link>https://accumulo.apache.org/</link>
     <atom:link href="https://accumulo.apache.org/feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Fri, 04 Jan 2019 13:40:54 -0500</pubDate>
-    <lastBuildDate>Fri, 04 Jan 2019 13:40:54 -0500</lastBuildDate>
+    <pubDate>Mon, 07 Jan 2019 14:26:50 -0500</pubDate>
+    <lastBuildDate>Mon, 07 Jan 2019 14:26:50 -0500</lastBuildDate>
     <generator>Jekyll v3.7.3</generator>
     
     
diff --git a/search_data.json b/search_data.json
index 554671f..90f31f2 100644
--- a/search_data.json
+++ b/search_data.json
@@ -51,7 +51,7 @@
   
     "docs-2-x-administration-upgrading": {
       "title": "Upgrading Accumulo",
-      "content"	 : "Upgrading from 1.8/9 to 2.0Follow the steps below to upgrade your Accumulo
instance and client to 2.0.Upgrade Accumulo instanceIMPORTANT! Before upgrading to Accumulo
2.0, you will need to upgrade to Java 8 and Hadoop 3.x.Upgrading to Accumulo 2.0 is done by
stopping Accumulo 1.8/9 and starting Accumulo 2.0.Before stopping Accumulo 1.8/9, install
Accumulo 2.0 and configure it by following the 2.0 quick start.There are several changes to
scripts and configuration in 2. [...]
+      "content"	 : "Upgrading from 1.8/9 to 2.0Follow the steps below to upgrade your Accumulo
instance and client to 2.0.Upgrade Accumulo instanceIMPORTANT! Before upgrading to Accumulo
2.0, you will need to upgrade to Java 8 and Hadoop 3.x.Upgrading to Accumulo 2.0 is done by
stopping Accumulo 1.8/9 and starting Accumulo 2.0.Before stopping Accumulo 1.8/9, install
Accumulo 2.0 and configure it by following the 2.0 quick start.There are several changes to
scripts and configuration in 2. [...]
       "url": " /docs/2.x/administration/upgrading",
       "categories": "administration"
     },
@@ -107,7 +107,7 @@
   
     "docs-2-x-development-mapreduce": {
       "title": "MapReduce",
-      "content"	 : "Accumulo tables can be used as the source and destination of MapReduce
jobs.General MapReduce configurationSince 2.0.0, Accumulo no longer has the same dependency
versions (i.e Guava, etc) as Hadoop.When launching a MapReduce job that reads or writes to
Accumulo, you should build a shaded jarwith all of your dependencies and complete the following
steps so YARN only includes Hadoop code(and not all of Hadoop dependencies) when running your
MapReduce job:      Set expo [...]
+      "content"	 : "Accumulo tables can be used as the source and destination of MapReduce
jobs.General MapReduce configurationAdd Accumulo’s MapReduce API to your dependenciesIf
you are using Maven, add the following dependency to your pom.xml to use Accumulo’s MapReduce
API:&amp;lt;dependency&amp;gt;  &amp;lt;groupId&amp;gt;org.apache.accumulo&amp;lt;/groupId&amp;gt;
 &amp;lt;artifactId&amp;gt;accumulo-hadoop-mapreduce&amp;lt;/artifactId&amp;gt;
 &amp;lt;version&amp;gt;2.0.0-alpha-1&am [...]
       "url": " /docs/2.x/development/mapreduce",
       "categories": "development"
     },


Mime
View raw message