From commits-return-22498-archive-asf-public=cust-asf.ponee.io@accumulo.apache.org Tue Jan 15 16:08:59 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 08A26180675 for ; Tue, 15 Jan 2019 16:08:57 +0100 (CET) Received: (qmail 24788 invoked by uid 500); 15 Jan 2019 15:08:57 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 24779 invoked by uid 99); 15 Jan 2019 15:08:57 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jan 2019 15:08:57 +0000 Received: by gitbox.apache.org (ASF Mail Server at gitbox.apache.org, from userid 33) id 85C11870CD; Tue, 15 Jan 2019 15:08:56 +0000 (UTC) Date: Tue, 15 Jan 2019 15:08:56 +0000 To: "commits@accumulo.apache.org" Subject: [accumulo-examples] branch master updated: More updates to MapReduce (#32) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Message-ID: <154756493646.18185.11789345772419616223@gitbox.apache.org> From: mwalch@apache.org X-Git-Host: gitbox.apache.org X-Git-Repo: accumulo-examples X-Git-Refname: refs/heads/master X-Git-Reftype: branch X-Git-Oldrev: c4b1eb5b24d846aa81f8f4f1d63733630db7e11f X-Git-Newrev: 26efc4950978d1575d92f04d0c38042334f17ee0 X-Git-Rev: 26efc4950978d1575d92f04d0c38042334f17ee0 X-Git-NotificationType: ref_changed_plus_diff X-Git-Multimail-Version: 1.5.dev Auto-Submitted: auto-generated This is an automated email from the ASF dual-hosted git repository. mwalch pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/accumulo-examples.git The following commit(s) were added to refs/heads/master by this push: new 26efc49 More updates to MapReduce (#32) 26efc49 is described below commit 26efc4950978d1575d92f04d0c38042334f17ee0 Author: Mike Walch AuthorDate: Tue Jan 15 10:08:51 2019 -0500 More updates to MapReduce (#32) * WordCount now supports using HDFS path for client props * Updated docs and fixed arguments to MapReduce job --- README.md | 16 +-- docs/compactionStrategy.md | 8 +- docs/dirlist.md | 18 ++-- docs/isolation.md | 4 +- docs/mapred.md | 114 --------------------- docs/sample.md | 8 +- docs/uniquecols.md | 23 +++++ docs/wordcount.md | 72 +++++++++++++ .../examples/mapreduce/TokenFileWordCount.java | 107 ------------------- .../accumulo/examples/mapreduce/WordCount.java | 12 ++- .../accumulo/examples/mapreduce/MapReduceIT.java | 2 +- 11 files changed, 134 insertions(+), 250 deletions(-) diff --git a/README.md b/README.md index 3a8ff8f..77c91bc 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ Follow the steps below to run the Accumulo examples: 1. Clone this repository - git clone https://github.com/apache/accumulo-examples.git + git clone https://github.com/apache/accumulo-examples.git 2. Follow [Accumulo's quickstart][quickstart] to install and run an Accumulo instance. Accumulo has an [accumulo-client.properties] in `conf/` that must be configured as @@ -34,13 +34,13 @@ Follow the steps below to run the Accumulo examples: are set in your shell, you may be able skip this step. Make sure `ACCUMULO_CLIENT_PROPS` is set to the location of your [accumulo-client.properties]. - cp conf/env.sh.example conf/env.sh - vim conf/env.sh + cp conf/env.sh.example conf/env.sh + vim conf/env.sh 3. Build the examples repo and copy the examples jar to Accumulo's `lib/ext` directory: - ./bin/build - cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/ + ./bin/build + cp target/accumulo-examples.jar /path/to/accumulo/lib/ext/ 4. Each Accumulo example has its own documentation and instructions for running the example which are linked to below. @@ -76,7 +76,6 @@ Each example below highlights a feature of Apache Accumulo. | [filter] | Using the AgeOffFilter to remove records more than 30 seconds old. | | [helloworld] | Inserting records both inside map/reduce jobs and outside. And reading records between two rows. | | [isolation] | Using the isolated scanner to ensure partial changes are not seen. | -| [mapred] | Using MapReduce to read from and write to Accumulo tables. | | [maxmutation] | Limiting mutation size to avoid running out of memory. | | [regex] | Using MapReduce and Accumulo to find data using regular expressions. | | [reservations] | Using conditional mutations to implement simple reservation system. | @@ -86,7 +85,9 @@ Each example below highlights a feature of Apache Accumulo. | [shard] | Using the intersecting iterator with a term index partitioned by document. | | [tabletofile] | Using MapReduce to read a table and write one of its columns to a file in HDFS. | | [terasort] | Generating random data and sorting it using Accumulo. | +| [uniquecols] | Use MapReduce to count unique columns in Accumulo | | [visibility] | Using visibilities (or combinations of authorizations). Also shows user permissions. | +| [wordcount] | Use MapReduce and Accumulo to do a word count on text files | ## Release Testing @@ -112,7 +113,6 @@ This repository can be used to test Accumulo release candidates. See [filter]: docs/filter.md [helloworld]: docs/helloworld.md [isolation]: docs/isolation.md -[mapred]: docs/mapred.md [maxmutation]: docs/maxmutation.md [regex]: docs/regex.md [reservations]: docs/reservations.md @@ -122,6 +122,8 @@ This repository can be used to test Accumulo release candidates. See [shard]: docs/shard.md [tabletofile]: docs/tabletofile.md [terasort]: docs/terasort.md +[uniquecols]: docs/uniquecols.md [visibility]: docs/visibility.md +[wordcount]: docs/wordcount.md [ti]: https://travis-ci.org/apache/accumulo-examples.svg?branch=master [tl]: https://travis-ci.org/apache/accumulo-examples diff --git a/docs/compactionStrategy.md b/docs/compactionStrategy.md index a7c96d5..6b5bebc 100644 --- a/docs/compactionStrategy.md +++ b/docs/compactionStrategy.md @@ -44,13 +44,13 @@ The commands below will configure the TwoTierCompactionStrategy to use gz compre Generate some data and files in order to test the strategy: - $ ./bin/runex client.SequentialBatchWriter -c ./examples.conf -t test1 --start 0 --num 10000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 + $ ./bin/runex client.SequentialBatchWriter -t test1 --start 0 --num 10000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 $ accumulo shell -u root -p secret -e "flush -t test1" - $ ./bin/runex client.SequentialBatchWriter -c ./examples.conf -t test1 --start 0 --num 11000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 + $ ./bin/runex client.SequentialBatchWriter -t test1 --start 0 --num 11000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 $ accumulo shell -u root -p secret -e "flush -t test1" - $ ./bin/runex client.SequentialBatchWriter -c ./examples.conf -t test1 --start 0 --num 12000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 + $ ./bin/runex client.SequentialBatchWriter -t test1 --start 0 --num 12000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 $ accumulo shell -u root -p secret -e "flush -t test1" - $ ./bin/runex client.SequentialBatchWriter -c ./examples.conf -t test1 --start 0 --num 13000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 + $ ./bin/runex client.SequentialBatchWriter -t test1 --start 0 --num 13000 --size 50 --batchMemory 20M --batchLatency 500 --batchThreads 20 $ accumulo shell -u root -p secret -e "flush -t test1" View the tserver log in /logs for the compaction and find the name of the that was compacted for your table. Print info about this file using the PrintInfo tool: diff --git a/docs/dirlist.md b/docs/dirlist.md index 3602d40..2b653cf 100644 --- a/docs/dirlist.md +++ b/docs/dirlist.md @@ -31,7 +31,7 @@ This example shows how to use Accumulo to store a file system history. It has th To begin, ingest some data with Ingest.java. - $ ./bin/runex dirlist.Ingest -c ./examples.conf --vis exampleVis --chunkSize 100000 /local/username/workspace + $ ./bin/runex dirlist.Ingest --vis exampleVis --chunkSize 100000 /local/username/workspace This may take some time if there are large files in the /local/username/workspace directory. If you use 0 instead of 100000 on the command line, the ingest will run much faster, but it will not put any file data into Accumulo (the dataTable will be empty). Note that running this example will create tables dirTable, indexTable, and dataTable in Accumulo that you should delete when you have completed the example. @@ -43,26 +43,26 @@ To browse the data ingested, use Viewer.java. Be sure to give the "username" use then run the Viewer: - $ ./bin/runex dirlist.Viewer -c ./examples.conf -t dirTable --dataTable dataTable --auths exampleVis --path /local/username/workspace + $ ./bin/runex dirlist.Viewer -t dirTable --dataTable dataTable --auths exampleVis --path /local/username/workspace To list the contents of specific directories, use QueryUtil.java. - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t dirTable --auths exampleVis --path /local/username - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t dirTable --auths exampleVis --path /local/username/workspace + $ ./bin/runex dirlist.QueryUtil -t dirTable --auths exampleVis --path /local/username + $ ./bin/runex dirlist.QueryUtil -t dirTable --auths exampleVis --path /local/username/workspace To perform searches on file or directory names, also use QueryUtil.java. Search terms must contain no more than one wild card and cannot contain "/". *Note* these queries run on the _indexTable_ table instead of the dirTable table. - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t indexTable --auths exampleVis --path filename --search - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t indexTable --auths exampleVis --path 'filename*' --search - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t indexTable --auths exampleVis --path '*jar' --search - $ ./bin/runex dirlist.QueryUtil -c ./examples.conf -t indexTable --auths exampleVis --path 'filename*jar' --search + $ ./bin/runex dirlist.QueryUtil -t indexTable --auths exampleVis --path filename --search + $ ./bin/runex dirlist.QueryUtil -t indexTable --auths exampleVis --path 'filename*' --search + $ ./bin/runex dirlist.QueryUtil -t indexTable --auths exampleVis --path '*jar' --search + $ ./bin/runex dirlist.QueryUtil -t indexTable --auths exampleVis --path 'filename*jar' --search To count the number of direct children (directories and files) and descendants (children and children's descendants, directories and files), run the FileCount over the dirTable table. The results are written back to the same table. FileCount reads from and writes to Accumulo. This requires scan authorizations for the read and a visibility for the data written. In this example, the authorizations and visibility are set to the same value, exampleVis. See the [visibility example][vis] for more information on visibility and authorizations. - $ ./bin/runex dirlist.FileCount -c ./examples.conf -t dirTable --auths exampleVis + $ ./bin/runex dirlist.FileCount -t dirTable --auths exampleVis ## Directory Table diff --git a/docs/isolation.md b/docs/isolation.md index d6dc5ac..a848af9 100644 --- a/docs/isolation.md +++ b/docs/isolation.md @@ -30,7 +30,7 @@ reading the row at the same time a mutation is changing the row. Below, Interference Test is run without isolation enabled for 5000 iterations and it reports problems. - $ ./bin/runex isolation.InterferenceTest -c ./examples.conf -t isotest --iterations 5000 + $ ./bin/runex isolation.InterferenceTest -t isotest --iterations 5000 ERROR Columns in row 053 had multiple values [53, 4553] ERROR Columns in row 061 had multiple values [561, 61] ERROR Columns in row 070 had multiple values [570, 1070] @@ -43,7 +43,7 @@ and it reports problems. Below, Interference Test is run with isolation enabled for 5000 iterations and it reports no problems. - $ ./bin/runex isolation.InterferenceTest -c ./examples.conf -t isotest --iterations 5000 --isolated + $ ./bin/runex isolation.InterferenceTest -t isotest --iterations 5000 --isolated finished diff --git a/docs/mapred.md b/docs/mapred.md deleted file mode 100644 index d370792..0000000 --- a/docs/mapred.md +++ /dev/null @@ -1,114 +0,0 @@ - -# Apache Accumulo MapReduce Example - -## WordCount Example - -The WordCount example ([WordCount.java]) uses MapReduce and Accumulo to compute -word counts for a set of documents. This is accomplished using a map-only MapReduce -job and a Accumulo table with combiners. - - -To run this example, create a directory in HDFS containing text files. You can -use the Accumulo README for data: - - $ hdfs dfs -mkdir /wc - $ hdfs dfs -copyFromLocal /path/to/accumulo/README.md /wc/README.md - -Verify that the file was created: - - $ hdfs dfs -ls /wc - -After creating the table, run the WordCount MapReduce job with your HDFS input directory: - - $ ./bin/runmr mapreduce.WordCount -i /wc - -[WordCount.java] creates an Accumulo table (named with a SummingCombiner iterator -attached to it. It runs a map-only M/R job that reads the specified HDFS directory containing text files and -writes word counts to Accumulo table. - -After the MapReduce job completes, query the Accumulo table to see word counts. - - $ accumulo shell - username@instance> table wordCount - username@instance wordCount> scan -b the - the count:20080906 [] 75 - their count:20080906 [] 2 - them count:20080906 [] 1 - then count:20080906 [] 1 - ... - -Another example to look at is -org.apache.accumulo.examples.mapreduce.UniqueColumns. This example -computes the unique set of columns in a table and shows how a map reduce job -can directly read a tables files from HDFS. - -One more example available is -org.apache.accumulo.examples.mapreduce.TokenFileWordCount. -The TokenFileWordCount example works exactly the same as the WordCount example -explained above except that it uses a token file rather than giving the -password directly to the map-reduce job (this avoids having the password -displayed in the job's configuration which is world-readable). - -To create a token file, use the create-token utility - - $ accumulo create-token - -It defaults to creating a PasswordToken, but you can specify the token class -with -tc (requires the fully qualified class name). Based on the token class, -it will prompt you for each property required to create the token. - -The last value it prompts for is a local filename to save to. If this file -exists, it will append the new token to the end. Multiple tokens can exist in -a file, but only the first one for each user will be recognized. - -Rather than waiting for the prompts, you can specify some options when calling -create-token, for example - - $ accumulo create-token -u root -p secret -f root.pw - -would create a token file containing a PasswordToken for -user 'root' with password 'secret' and saved to 'root.pw' - -This local file needs to be uploaded to hdfs to be used with the -map-reduce job. For example, if the file were 'root.pw' in the local directory: - - $ hadoop fs -put root.pw root.pw - -This would put 'root.pw' in the user's home directory in hdfs. - -Because the basic WordCount example uses Opts to parse its arguments -(which extends ClientOnRequiredTable), you can use a token file with -the basic WordCount example by calling the same command as explained above -except replacing the password with the token file (rather than -p, use -tf). - - $ ./bin/runmr mapreduce.WordCount --input /user/username/wc -t wordCount -u username -tf tokenfile - -In the above examples, username was 'root' and tokenfile was 'root.pw' - -However, if you don't want to use the Opts class to parse arguments, -the TokenFileWordCount is an example of using the token file manually. - - $ ./bin/runmr mapreduce.TokenFileWordCount instance zookeepers username tokenfile /user/username/wc wordCount - -The results should be the same as the WordCount example except that the -authentication token was not stored in the configuration. It was instead -stored in a file that the map-reduce job pulled into the distributed cache. -(If you ran either of these on the same table right after the -WordCount example, then the resulting counts should just double.) - -[WordCount.java]: ../src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java diff --git a/docs/sample.md b/docs/sample.md index 1f6cae5..4c58c3a 100644 --- a/docs/sample.md +++ b/docs/sample.md @@ -88,7 +88,7 @@ failure and fixiing the problem with a compaction. The example above is replicated in a java program using the Accumulo API. Below is the program name and the command to run it. - ./bin/runex sample.SampleExample -c ./examples.conf + ./bin/runex sample.SampleExample The commands below look under the hood to give some insight into how this feature works. The commands determine what files the sampex table is using. @@ -166,13 +166,13 @@ shard table based on the column qualifier. After enabling sampling, the command below counts the number of documents in the sample containing the words `import` and `int`. - $ ./bin/runex shard.Query --sample -c ./examples.conf -t shard import int | fgrep '.java' | wc + $ ./bin/runex shard.Query --sample -t shard import int | fgrep '.java' | wc 11 11 1246 The command below counts the total number of documents containing the words `import` and `int`. - $ ./bin/runex shard.Query -c ./examples.conf -t shard import int | fgrep '.java' | wc + $ ./bin/runex shard.Query -t shard import int | fgrep '.java' | wc 1085 1085 118175 The counts 11 out of 1085 total are around what would be expected for a modulus @@ -188,4 +188,4 @@ To experiment with this iterator, use the following command. The `--sampleCutoff` option below will cause the query to return nothing if based on the sample it appears a query would return more than 1000 documents. - $ ./bin/runex shard.Query --sampleCutoff 1000 -c ./examples.conf -t shard import int | fgrep '.java' | wc + $ ./bin/runex shard.Query --sampleCutoff 1000 -t shard import int | fgrep '.java' | wc diff --git a/docs/uniquecols.md b/docs/uniquecols.md new file mode 100644 index 0000000..46b6a30 --- /dev/null +++ b/docs/uniquecols.md @@ -0,0 +1,23 @@ + +# Apache Accumulo Unique Columns example + +The UniqueColumns examples ([UniqueColumns.java]) computes the unique set +of columns in a table and shows how a map reduce job can directly read a +tables files from HDFS. + +[UniqueColumns.java]: ../src/main/java/org/apache/accumulo/examples/mapreduce/UniqueColumns.java diff --git a/docs/wordcount.md b/docs/wordcount.md new file mode 100644 index 0000000..601f1de --- /dev/null +++ b/docs/wordcount.md @@ -0,0 +1,72 @@ + +# Apache Accumulo Word Count example + +The WordCount example ([WordCount.java]) uses MapReduce and Accumulo to compute +word counts for a set of documents. This is accomplished using a map-only MapReduce +job and a Accumulo table with combiners. + +To run this example, create a directory in HDFS containing text files. You can +use the Accumulo README for data: + + $ hdfs dfs -mkdir /wc + $ hdfs dfs -copyFromLocal /path/to/accumulo/README.md /wc/README.md + +Verify that the file was created: + + $ hdfs dfs -ls /wc + +After creating the table, run the WordCount MapReduce job with your HDFS input directory: + + $ ./bin/runmr mapreduce.WordCount -i /wc + +[WordCount.java] creates an Accumulo table (named with a SummingCombiner iterator +attached to it. It runs a map-only M/R job that reads the specified HDFS directory containing text files and +writes word counts to Accumulo table. + +After the MapReduce job completes, query the Accumulo table to see word counts. + + $ accumulo shell + username@instance> table wordCount + username@instance wordCount> scan -b the + the count:20080906 [] 75 + their count:20080906 [] 2 + them count:20080906 [] 1 + then count:20080906 [] 1 + ... + +When the WordCount MapReduce job was run above, the client properties were serialized +into the MapReduce configuration. This is insecure if the properties contain sensitive +information like passwords. A more secure option is store accumulo-client.properties +in HDFS and run th job with the `-D` options. This will configure the MapReduce job +to obtain the client properties from HDFS: + + $ hdfs dfs -copyFromLocal ./conf/accumulo-client.properties /user/myuser/ + $ ./bin/runmr mapreduce.WordCount -i /wc -t wordCount2 -d /user/myuser/accumulo-client.properties + +After the MapReduce job completes, query the `wordCount2` table. The results should +be the same as before: + + $ accumulo shell + username@instance> table wordCount + username@instance wordCount> scan -b the + the count:20080906 [] 75 + their count:20080906 [] 2 + ... + + +[WordCount.java]: ../src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java diff --git a/src/main/java/org/apache/accumulo/examples/mapreduce/TokenFileWordCount.java b/src/main/java/org/apache/accumulo/examples/mapreduce/TokenFileWordCount.java deleted file mode 100644 index 010989c..0000000 --- a/src/main/java/org/apache/accumulo/examples/mapreduce/TokenFileWordCount.java +++ /dev/null @@ -1,107 +0,0 @@ -/* - * Licensed to the Apache Software Foundation (ASF) under one or more - * contributor license agreements. See the NOTICE file distributed with - * this work for additional information regarding copyright ownership. - * The ASF licenses this file to You under the Apache License, Version 2.0 - * (the "License"); you may not use this file except in compliance with - * the License. You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ -package org.apache.accumulo.examples.mapreduce; - -import java.io.IOException; - -import org.apache.accumulo.core.client.ClientConfiguration; -import org.apache.accumulo.core.client.mapreduce.AccumuloOutputFormat; -import org.apache.accumulo.core.data.Mutation; -import org.apache.accumulo.core.data.Value; -import org.apache.hadoop.conf.Configuration; -import org.apache.hadoop.conf.Configured; -import org.apache.hadoop.io.LongWritable; -import org.apache.hadoop.io.Text; -import org.apache.hadoop.mapreduce.Job; -import org.apache.hadoop.mapreduce.Mapper; -import org.apache.hadoop.mapreduce.lib.input.TextInputFormat; -import org.apache.hadoop.util.Tool; -import org.apache.hadoop.util.ToolRunner; -import org.slf4j.Logger; -import org.slf4j.LoggerFactory; - -/** - * A simple map reduce job that inserts word counts into accumulo. See the README for instructions - * on how to run this. This version does not use the ClientOpts class to parse arguments as an - * example of using AccumuloInputFormat and AccumuloOutputFormat directly. See README.mapred for - * more details. - * - */ -public class TokenFileWordCount extends Configured implements Tool { - - private static final Logger log = LoggerFactory.getLogger(TokenFileWordCount.class); - - public static class MapClass extends Mapper { - @Override - public void map(LongWritable key, Text value, Context output) throws IOException { - String[] words = value.toString().split("\\s+"); - - for (String word : words) { - - Mutation mutation = new Mutation(new Text(word)); - mutation.put(new Text("count"), new Text("20080906"), new Value("1".getBytes())); - - try { - output.write(null, mutation); - } catch (InterruptedException e) { - log.error("Could not write to Context.", e); - } - } - } - } - - @Override - public int run(String[] args) throws Exception { - - String instance = args[0]; - String zookeepers = args[1]; - String user = args[2]; - String tokenFile = args[3]; - String input = args[4]; - String tableName = args[5]; - - Job job = Job.getInstance(getConf()); - job.setJobName(TokenFileWordCount.class.getName()); - job.setJarByClass(this.getClass()); - - job.setInputFormatClass(TextInputFormat.class); - TextInputFormat.setInputPaths(job, input); - - job.setMapperClass(MapClass.class); - - job.setNumReduceTasks(0); - - job.setOutputFormatClass(AccumuloOutputFormat.class); - job.setOutputKeyClass(Text.class); - job.setOutputValueClass(Mutation.class); - - // AccumuloInputFormat not used here, but it uses the same functions. - AccumuloOutputFormat.setZooKeeperInstance(job, - ClientConfiguration.loadDefault().withInstance(instance).withZkHosts(zookeepers)); - AccumuloOutputFormat.setConnectorInfo(job, user, tokenFile); - AccumuloOutputFormat.setCreateTables(job, true); - AccumuloOutputFormat.setDefaultTableName(job, tableName); - - job.waitForCompletion(true); - return job.isSuccessful() ? 0 : 1; - } - - public static void main(String[] args) throws Exception { - int res = ToolRunner.run(new Configuration(), new TokenFileWordCount(), args); - System.exit(res); - } -} diff --git a/src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java b/src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java index 5bc4c70..1864fe3 100644 --- a/src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java +++ b/src/main/java/org/apache/accumulo/examples/mapreduce/WordCount.java @@ -51,6 +51,9 @@ public class WordCount { String tableName = "wordCount"; @Parameter(names = {"-i", "--input"}, required = true, description = "HDFS input directory") String inputDirectory; + @Parameter(names = {"-d", "--dfsPath"}, + description = "HDFS Path where accumulo-client.properties exists") + String hdfsPath; } public static class MapClass extends Mapper { @@ -101,8 +104,13 @@ public class WordCount { job.setOutputKeyClass(Text.class); job.setOutputValueClass(Mutation.class); - AccumuloOutputFormat.configure().clientProperties(opts.getClientProperties()) - .defaultTable(opts.tableName).store(job); + if (opts.hdfsPath != null) { + AccumuloOutputFormat.configure().clientPropertiesPath(opts.hdfsPath) + .defaultTable(opts.tableName).store(job); + } else { + AccumuloOutputFormat.configure().clientProperties(opts.getClientProperties()) + .defaultTable(opts.tableName).store(job); + } System.exit(job.waitForCompletion(true) ? 0 : 1); } } diff --git a/src/test/java/org/apache/accumulo/examples/mapreduce/MapReduceIT.java b/src/test/java/org/apache/accumulo/examples/mapreduce/MapReduceIT.java index a5c83c0..d66aa0b 100644 --- a/src/test/java/org/apache/accumulo/examples/mapreduce/MapReduceIT.java +++ b/src/test/java/org/apache/accumulo/examples/mapreduce/MapReduceIT.java @@ -63,7 +63,7 @@ public class MapReduceIT extends ConfigurableMacBase { @Test public void test() throws Exception { - String confFile = System.getProperty("user.dir") + "/target/examples.conf"; + String confFile = System.getProperty("user.dir") + "/target/accumulo-client.properties"; String instance = getClientInfo().getInstanceName(); String keepers = getClientInfo().getZooKeepers(); ExamplesIT.writeClientPropsFile(confFile, instance, keepers, "root", ROOT_PASSWORD);