From commits-return-22475-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org Mon Jan 7 20:27:28 2019
Return-Path: Since 2.0.0, Accumulo no longer has the same dependency versions (i.e Guava, etc) as Hadoop.
-When launching a MapReduce job that reads or writes to Accumulo, you should build a shaded jar
-with all of your dependencies and complete the following steps so YARN only includes Hadoop code
-(and not all of Hadoop dependencies) when running your MapReduce job: If you are using Maven, add the following dependency to your The MapReduce API consists of the following classes: Before 2.0, the MapReduce API resided in the Before 2.0, Accumulo used the same versions for dependencies (such as Guava) as Hadoop. This allowed
+MapReduce jobs to run with both Accumulo’s & Hadoop’s dependencies on the classpath. Since 2.0, Accumulo no longer has the same versions for dependencies as Hadoop. While this allows
+Accumulo to update its dependencies more frequently, it can cause problems if both Accumulo’s &
+Hadoop’s dependencies are on the classpath of the MapReduce job. When launching a MapReduce job that
+use Accumulo, you should build a shaded jar with all of your dependencies and complete the following
+steps so YARN only includes Hadoop code (and not all of Hadoop’s dependencies) when running your MapReduce job:log4j.properties
for Accumulo clients and commandsaccumulo shell
to access the shell using configuration in conf/accumulo-client.properties
+
+ org.apache.accumulo.hadoop
package of the accumulo-hadoop-mapreduce
jar.org.apache.accumulo.core.client
package of the accumulo-core
has been deprecated and will
+eventually be removed.Upgrading from 1.7 to 1.8
diff --git a/docs/2.x/development/mapreduce.html b/docs/2.x/development/mapreduce.html
index 6b349a1..fc1e310 100644
--- a/docs/2.x/development/mapreduce.html
+++ b/docs/2.x/development/mapreduce.html
@@ -432,10 +432,49 @@
General MapReduce configuration
-Add Accumulo’s MapReduce API to your dependencies
+
+pom.xml
to use Accumulo’s MapReduce API:<dependency>
+ <groupId>org.apache.accumulo</groupId>
+ <artifactId>accumulo-hadoop-mapreduce</artifactId>
+ <version>2.0.0-alpha-1</version>
+</dependency>
+
+
+
+org.apache.accumulo.core.client
package of the accumulo-core
jar.
+While this old API still exists and can be used, it has been deprecated and will be removed eventually.Configure dependencies for your MapReduce job
+
+
yarn
command.
Configure your MapReduce job to use AccumuloInputFormat.
- Job job = Job.getInstance(getConf());
+ Job job = Job.getInstance();
job.setInputFormatClass(AccumuloInputFormat.class);
Properties props = Accumulo.newClientProperties().to("myinstance","zoo1,zoo2")
.as("user", "passwd").build();
@@ -488,7 +527,7 @@ your job with yarn
command.
.store(job);
AccumuloInputFormat can also be configured to read from multiple Accumulo tables.
- Job job = Job.getInstance(getConf());
+ Job job = Job.getInstance();
job.setInputFormatClass(AccumuloInputFormat.class);
Properties props = Accumulo.newClientProperties().to("myinstance","zoo1,zoo2")
.as("user", "passwd").build();
@@ -533,7 +572,7 @@ your job with yarn
command.
options.
Configure your MapReduce job to use AccumuloOutputFormat.
- Job job = Job.getInstance(getConf());
+ Job job = Job.getInstance();
job.setOutputFormatClass(AccumuloOutputFormat.class);
Properties props = Accumulo.newClientProperties().to("myinstance","zoo1,zoo2")
.as("user", "passwd").build();
@@ -543,6 +582,32 @@ your job with yarn
command.
+Write output to RFiles in HDFS
+
+Follow the step below to have a MapReduce job output to RFiles in HDFS. These files
+can then be bulk imported into Accumulo:
+
+
+ - Create a Mapper or Reducer with
Key
& Value
as output parameters.
+ class MyReducer extends Reducer<WritableComparable, Writable, Key, Value> public void reduce(WritableComparable key, Iterable<Text> values, Context c) {
+ Key key;
+ Value value;
+ // create Key & Value based on input
+ c.write(key, value);
+ }
+ }
+
+
+ - Configure your MapReduce job to use AccumuloFileOutputFormat.
+
Job job = Job.getInstance();
+ job.setOutputFormatClass(AccumuloFileOutputFormat.class);
+ AccumuloFileOutputFormat.configure()
+ .outputPath(new Path("hdfs://localhost:8020/myoutput/")).store(job);
+
+
+
+
The MapReduce example contains a complete example of using MapReduce with Accumulo.
diff --git a/feed.xml b/feed.xml
index 6530228..a7bcc03 100644
--- a/feed.xml
+++ b/feed.xml
@@ -6,8 +6,8 @@
https://accumulo.apache.org/
- Fri, 04 Jan 2019 13:40:54 -0500
- Fri, 04 Jan 2019 13:40:54 -0500
+ Mon, 07 Jan 2019 14:26:50 -0500
+ Mon, 07 Jan 2019 14:26:50 -0500
Jekyll v3.7.3
diff --git a/search_data.json b/search_data.json
index 554671f..90f31f2 100644
--- a/search_data.json
+++ b/search_data.json
@@ -51,7 +51,7 @@
"docs-2-x-administration-upgrading": {
"title": "Upgrading Accumulo",
- "content" : "Upgrading from 1.8/9 to 2.0Follow the steps below to upgrade your Accumulo instance and client to 2.0.Upgrade Accumulo instanceIMPORTANT! Before upgrading to Accumulo 2.0, you will need to upgrade to Java 8 and Hadoop 3.x.Upgrading to Accumulo 2.0 is done by stopping Accumulo 1.8/9 and starting Accumulo 2.0.Before stopping Accumulo 1.8/9, install Accumulo 2.0 and configure it by following the 2.0 quick start.There are several changes to scripts and configuration in 2. [...]
+ "content" : "Upgrading from 1.8/9 to 2.0Follow the steps below to upgrade your Accumulo instance and client to 2.0.Upgrade Accumulo instanceIMPORTANT! Before upgrading to Accumulo 2.0, you will need to upgrade to Java 8 and Hadoop 3.x.Upgrading to Accumulo 2.0 is done by stopping Accumulo 1.8/9 and starting Accumulo 2.0.Before stopping Accumulo 1.8/9, install Accumulo 2.0 and configure it by following the 2.0 quick start.There are several changes to scripts and configuration in 2. [...]
"url": " /docs/2.x/administration/upgrading",
"categories": "administration"
},
@@ -107,7 +107,7 @@
"docs-2-x-development-mapreduce": {
"title": "MapReduce",
- "content" : "Accumulo tables can be used as the source and destination of MapReduce jobs.General MapReduce configurationSince 2.0.0, Accumulo no longer has the same dependency versions (i.e Guava, etc) as Hadoop.When launching a MapReduce job that reads or writes to Accumulo, you should build a shaded jarwith all of your dependencies and complete the following steps so YARN only includes Hadoop code(and not all of Hadoop dependencies) when running your MapReduce job: Set expo [...]
+ "content" : "Accumulo tables can be used as the source and destination of MapReduce jobs.General MapReduce configurationAdd Accumulo’s MapReduce API to your dependenciesIf you are using Maven, add the following dependency to your pom.xml to use Accumulo’s MapReduce API:<dependency> <groupId>org.apache.accumulo</groupId> <artifactId>accumulo-hadoop-mapreduce</artifactId> <version>2.0.0-alpha-1&am [...]
"url": " /docs/2.x/development/mapreduce",
"categories": "development"
},