Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EC28E200C3D for ; Mon, 27 Feb 2017 17:24:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EAC47160B56; Mon, 27 Feb 2017 16:24:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 65942160B7A for ; Mon, 27 Feb 2017 17:24:36 +0100 (CET) Received: (qmail 91243 invoked by uid 500); 27 Feb 2017 16:24:35 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 91204 invoked by uid 99); 27 Feb 2017 16:24:35 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Feb 2017 16:24:35 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 337A9DFE20; Mon, 27 Feb 2017 16:24:35 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit From: mjwall@apache.org To: commits@accumulo.apache.org Date: Mon, 27 Feb 2017 16:24:36 -0000 Message-Id: In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [02/16] accumulo-website git commit: Jekyll build from master:90a96b6 archived-at: Mon, 27 Feb 2017 16:24:39 -0000 http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/1d06e96a/1.8/examples/mapred.md ---------------------------------------------------------------------- diff --git a/1.8/examples/mapred.md b/1.8/examples/mapred.md new file mode 100644 index 0000000..9e9b17f --- /dev/null +++ b/1.8/examples/mapred.md @@ -0,0 +1,154 @@ +Title: Apache Accumulo MapReduce Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example uses mapreduce and accumulo to compute word counts for a set of +documents. This is accomplished using a map-only mapreduce job and a +accumulo table with combiners. + +To run this example you will need a directory in HDFS containing text files. +The accumulo readme will be used to show how to run this example. + + $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README + $ hadoop fs -ls /user/username/wc + Found 1 items + -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README + +The first part of running this example is to create a table with a combiner +for the column family count. + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable wordCount + username@instance wordCount> setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 10 -t wordCount -majc -minc -scan + SummingCombiner interprets Values as Longs and adds them together. A variety of encodings (variable length, fixed length, or string) are available + ----------> set SummingCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: false + ----------> set SummingCombiner parameter columns, [:]{,[:]} escape non-alphanum chars using %.: count + ----------> set SummingCombiner parameter lossy, if true, failed decodes are ignored. Otherwise combiner will error on failed decodes (default false): : false + ----------> set SummingCombiner parameter type, : STRING + username@instance wordCount> quit + +After creating the table, run the word count map reduce job. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z zookeepers --input /user/username/wc -t wordCount -u username -p password + + 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1 + 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003 + 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% + 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% + 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003 + 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 + 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters + 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 + 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 + 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters + 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 + 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework + 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 + 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 + 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 + +After the map reduce job completes, query the accumulo table to see word +counts. + + $ ./bin/accumulo shell -u username -p password + username@instance> table wordCount + username@instance wordCount> scan -b the + the count:20080906 [] 75 + their count:20080906 [] 2 + them count:20080906 [] 1 + then count:20080906 [] 1 + there count:20080906 [] 1 + these count:20080906 [] 3 + this count:20080906 [] 6 + through count:20080906 [] 1 + time count:20080906 [] 3 + time. count:20080906 [] 1 + to count:20080906 [] 27 + total count:20080906 [] 1 + tserver, count:20080906 [] 1 + tserver.compaction.major.concurrent.max count:20080906 [] 1 + ... + +Another example to look at is +org.apache.accumulo.examples.simple.mapreduce.UniqueColumns. This example +computes the unique set of columns in a table and shows how a map reduce job +can directly read a tables files from HDFS. + +One more example available is +org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount. +The TokenFileWordCount example works exactly the same as the WordCount example +explained above except that it uses a token file rather than giving the +password directly to the map-reduce job (this avoids having the password +displayed in the job's configuration which is world-readable). + +To create a token file, use the create-token utility + + $ ./bin/accumulo create-token + +It defaults to creating a PasswordToken, but you can specify the token class +with -tc (requires the fully qualified class name). Based on the token class, +it will prompt you for each property required to create the token. + +The last value it prompts for is a local filename to save to. If this file +exists, it will append the new token to the end. Multiple tokens can exist in +a file, but only the first one for each user will be recognized. + +Rather than waiting for the prompts, you can specify some options when calling +create-token, for example + + $ ./bin/accumulo create-token -u root -p secret -f root.pw + +would create a token file containing a PasswordToken for +user 'root' with password 'secret' and saved to 'root.pw' + +This local file needs to be uploaded to hdfs to be used with the +map-reduce job. For example, if the file were 'root.pw' in the local directory: + + $ hadoop fs -put root.pw root.pw + +This would put 'root.pw' in the user's home directory in hdfs. + +Because the basic WordCount example uses Opts to parse its arguments +(which extends ClientOnRequiredTable), you can use a token file with +the basic WordCount example by calling the same command as explained above +except replacing the password with the token file (rather than -p, use -tf). + + $ ./bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z zookeepers --input /user/username/wc -t wordCount -u username -tf tokenfile + +In the above examples, username was 'root' and tokenfile was 'root.pw' + +However, if you don't want to use the Opts class to parse arguments, +the TokenFileWordCount is an example of using the token file manually. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TokenFileWordCount instance zookeepers username tokenfile /user/username/wc wordCount + +The results should be the same as the WordCount example except that the +authentication token was not stored in the configuration. It was instead +stored in a file that the map-reduce job pulled into the distributed cache. +(If you ran either of these on the same table right after the +WordCount example, then the resulting counts should just double.) + + + + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/1d06e96a/1.8/examples/maxmutation.html ---------------------------------------------------------------------- diff --git a/1.8/examples/maxmutation.html b/1.8/examples/maxmutation.html index 5c491a1..a5dee62 100644 --- a/1.8/examples/maxmutation.html +++ b/1.8/examples/maxmutation.html @@ -25,7 +25,7 @@ -Apache Accumulo MaxMutation Constraints Example +Apache Accumulo™ @@ -84,7 +84,7 @@