http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/dirlist.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/dirlist.html b/1.3/user_manual/examples/dirlist.html new file mode 100644 index 0000000..ef0eda2 --- /dev/null +++ b/1.3/user_manual/examples/dirlist.html @@ -0,0 +1,239 @@ + + + + + + + + + + + + +File System Archive + + + + + + + + + + + + +
+
+
+ + +
+ +

File System Archive

+ +

This example shows how to use Accumulo to store a file system history. It has three classes:

+ +
    +
  • Ingest.java - Recursively lists the files and directories under a given path, ingests their names and file info (not the file data!) into an Accumulo table, and indexes the file names in a separate table.
  • +
  • QueryUtil.java - Provides utility methods for getting the info for a file, listing the contents of a directory, and performing single wild card searches on file or directory names.
  • +
  • Viewer.java - Provides a GUI for browsing the file system information stored in Accumulo.
  • +
  • FileCountMR.java - Runs MR over the file system information and writes out counts to an Accumulo table.
  • +
  • FileCount.java - Accomplishes the same thing as FileCountMR, but in a different way. Computes recursive counts and stores them back into table.
  • +
  • StringArraySummation.java - Aggregates counts for the FileCountMR reducer.
  • +
+ +

To begin, ingest some data with Ingest.java.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.dirlist.Ingest instance zookeepers username password direxample dirindex exampleVis /local/user1/workspace
+
+
+ +

Note that running this example will create tables direxample and dirindex in Accumulo that you should delete when you have completed the example. +If you modify a file or add new files in the directory ingested (e.g. /local/user1/workspace), you can run Ingest again to add new information into the Accumulo tables.

+ +

To browse the data ingested, use Viewer.java. Be sure to give the “username” user the authorizations to see the data.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.dirlist.Viewer instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+
+ +

To list the contents of specific directories, use QueryUtil.java.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1
+$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password direxample exampleVis /local/user1/workspace
+
+
+ +

To perform searches on file or directory names, also use QueryUtil.java. Search terms must contain no more than one wild card and cannot contain “/”. +Note these queries run on the dirindex table instead of the direxample table.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename -search
+$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis 'filename*' -search
+$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis '*jar' -search
+$ ./bin/accumulo org.apache.accumulo.examples.dirlist.QueryUtil instance zookeepers username password dirindex exampleVis filename*jar -search
+
+
+ +

To count the number of direct children (directories and files) and descendants (children and children’s descendents, directories and files), run the FileCountMR over the direxample table. +The results can be written back to the same table.

+ +
$ ./bin/tool.sh lib/accumulo-examples-*[^c].jar org.apache.accumulo.examples.dirlist.FileCountMR instance zookeepers username password direxample direxample exampleVis exampleVis
+
+
+ +

Alternatively, you can also run FileCount.java.

+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/filter.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/filter.html b/1.3/user_manual/examples/filter.html new file mode 100644 index 0000000..5a191af --- /dev/null +++ b/1.3/user_manual/examples/filter.html @@ -0,0 +1,283 @@ + + + + + + + + + + + + +Filter Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Filter Example

+ +

This is a simple filter example. It uses the AgeOffFilter that is provided as +part of the core package org.apache.accumulo.core.iterators.filter. Filters are used by +the FilteringIterator to select desired key/value pairs (or weed out undesired +ones). Filters implement the org.apache.accumulo.core.iterators.iterators.filter.Filter interface which +contains a method accept(Key k, Value v). This method returns true if the key, +value pair are to be delivered and false if they are to be ignored.

+ +
username@instance> createtable filtertest
+username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -filter
+FilteringIterator uses Filters to accept or reject key/value pairs
+----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: 
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+username@instance filtertest> 
+
+username@instance filtertest> scan
+username@instance filtertest> insert foo a b c
+username@instance filtertest> scan
+foo a:b []    c
+
+
+ +

… wait 30 seconds …

+ +
username@instance filtertest> scan
+username@instance filtertest>
+
+
+ +

Note the absence of the entry inserted more than 30 seconds ago. Since the +scope was set to “scan”, this means the entry is still in Accumulo, but is +being filtered out at query time. To delete entries from Accumulo based on +the ages of their timestamps, AgeOffFilters should be set up for the “minc” +and “majc” scopes, as well.

+ +

To force an ageoff in the persisted data, after setting up the ageoff iterator +on the “minc” and “majc” scopes you can flush and compact your table. This will +happen automatically as a background operation on any table that is being +actively written to, but these are the commands to force compaction:

+ +
username@instance filtertest> setiter -t filtertest -scan -minc -majc -p 10 -n myfilter -filter
+FilteringIterator uses Filters to accept or reject key/value pairs
+----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: 
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+username@instance filtertest> 
+
+username@instance filtertest> flush -t filtertest
+08 11:13:55,745 [shell.Shell] INFO : Flush of table filtertest initiated...
+username@instance filtertest> compact -t filtertest
+08 11:14:10,800 [shell.Shell] INFO : Compaction of table filtertest scheduled for 20110208111410EST
+username@instance filtertest> 
+
+
+ +

After the compaction runs, the newly created files will not contain any data that should be aged off, and the +Accumulo garbage collector will remove the old files.

+ +

To see the iterator settings for a table, use:

+ +
username@instance filtertest> config -t filtertest -f iterator
+---------+------------------------------------------+----------------------------------------------------------
+SCOPE    | NAME                                     | VALUE
+---------+------------------------------------------+----------------------------------------------------------
+table    | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.majc.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.majc.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.majc.vers.opt.maxVersions .. | 1
+table    | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.minc.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.minc.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.minc.vers.opt.maxVersions .. | 1
+table    | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.scan.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.scan.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.scan.vers.opt.maxVersions .. | 1
+---------+------------------------------------------+----------------------------------------------------------
+username@instance filtertest> 
+
+
+ +

If you would like to apply multiple filters, this can be done using a single +iterator. Just continue adding entries during the +“set org.apache.accumulo.core.iterators.FilteringIterator option” step. +Make sure to order the filterPriorityNumbers in the order you would like +the filters to be applied.

+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/helloworld.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/helloworld.html b/1.3/user_manual/examples/helloworld.html new file mode 100644 index 0000000..a9db9fb --- /dev/null +++ b/1.3/user_manual/examples/helloworld.html @@ -0,0 +1,238 @@ + + + + + + + + + + + + +Hello World Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Hello World Example

+ +

This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.helloworld in the accumulo-examples module:

+ +
    +
  • InsertWithBatchWriter.java - Inserts 10K rows (50K entries) into accumulo with each row having 5 entries
  • +
  • InsertWithOutputFormat.java - Example of inserting data in MapReduce
  • +
  • ReadData.java - Reads all data between two rows
  • +
+ +

Log into the accumulo shell:

+ +
$ ./bin/accumulo shell -u username -p password
+
+
+ +

Create a table called ‘hellotable’:

+ +
username@instance> createtable hellotable
+
+
+ +

Launch a Java program that inserts data with a BatchWriter:

+ +
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithBatchWriter instance zookeepers hellotable username password
+
+
+ +

Alternatively, the same data can be inserted using MapReduce writers:

+ +
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithOutputFormat instance zookeepers hellotable username password
+
+
+ +

On the accumulo status page at the URL below (where ‘master’ is replaced with the name or IP of your accumulo master), you should see 50K entries

+ +
http://master:50095/
+
+
+ +

To view the entries, use the shell to scan the table:

+ +
username@instance> table hellotable
+username@instance hellotable> scan
+
+
+ +

You can also use a Java class to scan the table:

+ +
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.ReadData instance zookeepers hellotable username password row_0 row_1001
+
+
+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/index.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/index.html b/1.3/user_manual/examples/index.html new file mode 100644 index 0000000..93f5bd3 --- /dev/null +++ b/1.3/user_manual/examples/index.html @@ -0,0 +1,230 @@ + + + + + + + + + + + + +Examples + + + + + + + + + + + + +
+
+
+ + +
+ +

Examples

+ +

Each README in the examples directory highlights the use of particular features of Apache Accumulo.

+ +

Before running any of the examples, the following steps must be performed.

+ +
    +
  1. +

    Install and run Accumulo via the instructions found in $ACCUMULO_HOME/README. +Remember the instance name. It will be referred to as “instance” throughout the examples. +A comma-separated list of zookeeper servers will be referred to as “zookeepers”.

    +
  2. +
  3. +

    Create an Accumulo user (see the user manual), or use the root user. +The Accumulo user name will be referred to as “username” with password “password” throughout the examples.

    +
  4. +
+ +

In all commands, you will need to replace “instance”, “zookeepers”, “username”, and “password” with the values you set for your Accumulo instance.

+ +

Commands intended to be run in bash are prefixed by ‘$’. These are always assumed to be run from the $ACCUMULO_HOME directory.

+ +

Commands intended to be run in the Accumulo shell are prefixed by ‘>’.

+ +

aggregation

+ +

batch

+ +

bloom

+ +

bulkIngest

+ +

constraints

+ +

dirlist

+ +

filter

+ +

helloworld

+ +

mapred

+ +

shard

+ + +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/mapred.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/mapred.html b/1.3/user_manual/examples/mapred.html new file mode 100644 index 0000000..6a1bb88 --- /dev/null +++ b/1.3/user_manual/examples/mapred.html @@ -0,0 +1,263 @@ + + + + + + + + + + + + +MapReduce Example + + + + + + + + + + + + +
+
+
+ + +
+ +

MapReduce Example

+ +

This example uses mapreduce and accumulo to compute word counts for a set of +documents. This is accomplished using a map-only mapreduce job and a +accumulo table with aggregators.

+ +

To run this example you will need a directory in HDFS containing text files. +The accumulo readme will be used to show how to run this example.

+ +
$ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README
+$ hadoop fs -ls /user/username/wc
+Found 1 items
+-rw-r--r--   2 username supergroup       9359 2009-07-15 17:54 /user/username/wc/Accumulo.README
+
+
+ +

The first part of running this example is to create a table with aggregation +for the column family count.

+ +
$ ./bin/accumulo shell -u username -p password
+Shell - Apache Accumulo Interactive Shell
+- version: 1.3.x-incubating
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation 
+username@instance wordCount> quit
+
+
+ +

After creating the table, run the word count map reduce job.

+ +
[user1@instance accumulo]$ bin/tool.sh lib/accumulo-examples-*[^c].jar org.apache.accumulo.examples.mapreduce.WordCount instance zookeepers /user/user1/wc wordCount -u username -p password
+
+11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1
+11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003
+11/02/07 18:20:13 INFO mapred.JobClient:  map 0% reduce 0%
+11/02/07 18:20:20 INFO mapred.JobClient:  map 100% reduce 0%
+11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003
+11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
+11/02/07 18:20:22 INFO mapred.JobClient:   Job Counters 
+11/02/07 18:20:22 INFO mapred.JobClient:     Launched map tasks=1
+11/02/07 18:20:22 INFO mapred.JobClient:     Data-local map tasks=1
+11/02/07 18:20:22 INFO mapred.JobClient:   FileSystemCounters
+11/02/07 18:20:22 INFO mapred.JobClient:     HDFS_BYTES_READ=10487
+11/02/07 18:20:22 INFO mapred.JobClient:   Map-Reduce Framework
+11/02/07 18:20:22 INFO mapred.JobClient:     Map input records=255
+11/02/07 18:20:22 INFO mapred.JobClient:     Spilled Records=0
+11/02/07 18:20:22 INFO mapred.JobClient:     Map output records=1452
+
+
+ +

After the map reduce job completes, query the accumulo table to see word +counts.

+ +
$ ./bin/accumulo shell -u username -p password
+username@instance> table wordCount
+username@instance wordCount> scan -b the
+the count:20080906 []    75
+their count:20080906 []    2
+them count:20080906 []    1
+then count:20080906 []    1
+there count:20080906 []    1
+these count:20080906 []    3
+this count:20080906 []    6
+through count:20080906 []    1
+time count:20080906 []    3
+time. count:20080906 []    1
+to count:20080906 []    27
+total count:20080906 []    1
+tserver, count:20080906 []    1
+tserver.compaction.major.concurrent.max count:20080906 []    1
+...
+
+
+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/examples/shard.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/shard.html b/1.3/user_manual/examples/shard.html new file mode 100644 index 0000000..666ac5f --- /dev/null +++ b/1.3/user_manual/examples/shard.html @@ -0,0 +1,248 @@ + + + + + + + + + + + + +Shard Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Shard Example

+ +

Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by +document, or “sharded”. This example shows how to use the intersecting iterator through these four programs:

+ +
    +
  • Index.java - Indexes a set of text files into an Accumulo table
  • +
  • Query.java - Finds documents containing a given set of terms.
  • +
  • Reverse.java - Reads the index table and writes a map of documents to terms into another table.
  • +
  • ContinuousQuery.java Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms.
  • +
+ +

To run these example programs, create two tables like below.

+ +
username@instance> createtable shard
+username@instance shard> createtable doc2term
+
+
+ +

After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.

+ +
$ cd /local/user1/workspace/accumulo/
+$ find src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.shard.Index instance zookeepers shard username password 30
+
+
+ +

The following command queries the index to find all files containing ‘foo’ and ‘bar’.

+ +
$ cd $ACCUMULO_HOME
+$ ./bin/accumulo org.apache.accumulo.examples.shard.Query instance zookeepers shard username password foo bar
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java
+/local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/RowDeleteTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java
+/local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/DeleteEverythingTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java
+
+
+ +

Inorder to run ContinuousQuery, we need to run Reverse.java to populate doc2term

+ +
$ ./bin/accumulo org.apache.accumulo.examples.shard.Reverse instance zookeepers shard doc2term username password
+
+
+ +

Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.shard.ContinuousQuery instance zookeepers shard doc2term username password 5
+[public, core, class, binarycomparable, b] 2  0.081
+[wordtodelete, unindexdocument, doctablename, putdelete, insert] 1  0.041
+[import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1  0.049
+[getpackage, testversion, util, version, 55] 1  0.048
+[for, static, println, public, the] 55  0.211
+[sleeptime, wrappingiterator, options, long, utilwaitthread] 1  0.057
+[string, public, long, 0, wait] 12  0.132
+
+
+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/failure_handling.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/failure_handling.png b/1.3/user_manual/failure_handling.png new file mode 100644 index 0000000..90b9f0f Binary files /dev/null and b/1.3/user_manual/failure_handling.png differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/img1.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/img1.png b/1.3/user_manual/img1.png new file mode 100644 index 0000000..8a5846c Binary files /dev/null and b/1.3/user_manual/img1.png differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/img2.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/img2.png b/1.3/user_manual/img2.png new file mode 100644 index 0000000..cbfe2b3 Binary files /dev/null and b/1.3/user_manual/img2.png differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/img3.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/img3.png b/1.3/user_manual/img3.png new file mode 100644 index 0000000..3b6f1f2 Binary files /dev/null and b/1.3/user_manual/img3.png differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/img4.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/img4.png b/1.3/user_manual/img4.png new file mode 100644 index 0000000..5b0ceb2 Binary files /dev/null and b/1.3/user_manual/img4.png differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/c0655661/1.3/user_manual/img5.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/img5.png b/1.3/user_manual/img5.png new file mode 100644 index 0000000..83d8955 Binary files /dev/null and b/1.3/user_manual/img5.png differ