Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/filter.html Tue Nov 1 17:08:17 2011 @@ -0,0 +1,203 @@ + + + + + + Filter Example + + + + + + + + + +
+

Filter Example

+

This is a simple filter example. It uses the AgeOffFilter that is provided as +part of the core package org.apache.accumulo.core.iterators.filter. Filters are used by +the FilteringIterator to select desired key/value pairs (or weed out undesired +ones). Filters implement the org.apache.accumulo.core.iterators.iterators.filter.Filter interface which +contains a method accept(Key k, Value v). This method returns true if the key, +value pair are to be delivered and false if they are to be ignored.

+
username@instance> createtable filtertest
+username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -filter
+FilteringIterator uses Filters to accept or reject key/value pairs
+----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time 
 of day: 
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+username@instance filtertest>
+
+username@instance filtertest> scan
+username@instance filtertest> insert foo a b c
+username@instance filtertest> scan
+foo a:b []  c
+
+ + +

... wait 30 seconds ...

+
username@instance filtertest> scan
+username@instance filtertest>
+
+ + +

Note the absence of the entry inserted more than 30 seconds ago. Since the +scope was set to "scan", this means the entry is still in Accumulo, but is +being filtered out at query time. To delete entries from Accumulo based on +the ages of their timestamps, AgeOffFilters should be set up for the "minc" +and "majc" scopes, as well.

+

To force an ageoff in the persisted data, after setting up the ageoff iterator +on the "minc" and "majc" scopes you can flush and compact your table. This will +happen automatically as a background operation on any table that is being +actively written to, but these are the commands to force compaction:

+
username@instance filtertest> setiter -t filtertest -scan -minc -majc -p 10 -n myfilter -filter
+FilteringIterator uses Filters to accept or reject key/value pairs
+----------> entering options: <filterPriorityNumber> <ageoff|regex|filterClass>
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 0 ageoff
+----------> set org.apache.accumulo.core.iterators.FilteringIterator option (<name> <value>, hit enter to skip): 
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time 
 of day: 
+----------> set org.apache.accumulo.core.iterators.filter.AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+username@instance filtertest>
+
+username@instance filtertest> flush -t filtertest
+08 11:13:55,745 [shell.Shell] INFO : Flush of table filtertest initiated...
+username@instance filtertest> compact -t filtertest
+08 11:14:10,800 [shell.Shell] INFO : Compaction of table filtertest scheduled for 20110208111410EST
+username@instance filtertest>
+
+ + +

After the compaction runs, the newly created files will not contain any data that should be aged off, and the +Accumulo garbage collector will remove the old files.

+

To see the iterator settings for a table, use:

+
username@instance filtertest> config -t filtertest -f iterator
+---------+------------------------------------------+----------------------------------------------------------
+SCOPE    | NAME                                     | VALUE
+---------+------------------------------------------+----------------------------------------------------------
+table    | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.majc.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.majc.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.majc.vers.opt.maxVersions .. | 1
+table    | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.minc.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.minc.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.minc.vers.opt.maxVersions .. | 1
+table    | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.FilteringIterator
+table    | table.iterator.scan.myfilter.opt.0 ........ | org.apache.accumulo.core.iterators.filter.AgeOffFilter
+table    | table.iterator.scan.myfilter.opt.0.ttl .... | 30000
+table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.VersioningIterator
+table    | table.iterator.scan.vers.opt.maxVersions .. | 1
+---------+------------------------------------------+----------------------------------------------------------
+username@instance filtertest>
+
+ + +

If you would like to apply multiple filters, this can be done using a single +iterator. Just continue adding entries during the +"set org.apache.accumulo.core.iterators.FilteringIterator option" step. +Make sure to order the filterPriorityNumbers in the order you would like +the filters to be applied.

+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/helloworld.html Tue Nov 1 17:08:17 2011 @@ -0,0 +1,154 @@ + + + + + + Hello World Example + + + + + + + + + +
+

Hello World Example

+

This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.helloworld in the accumulo-examples module:

+ +

Log into the accumulo shell:

+
$ ./bin/accumulo shell -u username -p password
+
+ + +

Create a table called 'hellotable':

+
username@instance> createtable hellotable
+
+ + +

Launch a Java program that inserts data with a BatchWriter:

+
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithBatchWriter instance zookeepers hellotable username password
+
+ + +

Alternatively, the same data can be inserted using MapReduce writers:

+
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.InsertWithOutputFormat instance zookeepers hellotable username password
+
+ + +

On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries

+
http://master:50095/
+
+ + +

To view the entries, use the shell to scan the table:

+
username@instance> table hellotable
+username@instance hellotable> scan
+
+ + +

You can also use a Java class to scan the table:

+
$ ./bin/accumulo org.apache.accumulo.examples.helloworld.ReadData instance zookeepers hellotable username password row_0 row_1001
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/mapred.html Tue Nov 1 17:08:17 2011 @@ -0,0 +1,183 @@ + + + + + + MapReduce Example + + + + + + + + + +
+

MapReduce Example

+

This example uses mapreduce and accumulo to compute word counts for a set of +documents. This is accomplished using a map-only mapreduce job and a +accumulo table with aggregators.

+

To run this example you will need a directory in HDFS containing text files. +The accumulo readme will be used to show how to run this example.

+
$ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README
+$ hadoop fs -ls /user/username/wc
+Found 1 items
+-rw-r--r--   2 username supergroup       9359 2009-07-15 17:54 /user/username/wc/Accumulo.README
+
+ + +

The first part of running this example is to create a table with aggregation +for the column family count.

+
$ ./bin/accumulo shell -u username -p password
+Shell - Accumulo Interactive Shell
+- version: 1.3.x-incubating
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> createtable wordCount -a count=org.apache.accumulo.core.iterators.aggregation.StringSummation 
+username@instance wordCount> quit
+
+ + +

After creating the table, run the word count map reduce job.

+
[user1@instance accumulo]$ bin/tool.sh lib/accumulo-examples-*[^c].jar org.apache.accumulo.examples.mapreduce.WordCount instance zookeepers /user/user1/wc wordCount -u username -p password
+
+11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1
+11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003
+11/02/07 18:20:13 INFO mapred.JobClient:  map 0% reduce 0%
+11/02/07 18:20:20 INFO mapred.JobClient:  map 100% reduce 0%
+11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003
+11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6
+11/02/07 18:20:22 INFO mapred.JobClient:   Job Counters 
+11/02/07 18:20:22 INFO mapred.JobClient:     Launched map tasks=1
+11/02/07 18:20:22 INFO mapred.JobClient:     Data-local map tasks=1
+11/02/07 18:20:22 INFO mapred.JobClient:   FileSystemCounters
+11/02/07 18:20:22 INFO mapred.JobClient:     HDFS_BYTES_READ=10487
+11/02/07 18:20:22 INFO mapred.JobClient:   Map-Reduce Framework
+11/02/07 18:20:22 INFO mapred.JobClient:     Map input records=255
+11/02/07 18:20:22 INFO mapred.JobClient:     Spilled Records=0
+11/02/07 18:20:22 INFO mapred.JobClient:     Map output records=1452
+
+ + +

After the map reduce job completes, query the accumulo table to see word +counts.

+
$ ./bin/accumulo shell -u username -p password
+username@instance> table wordCount
+username@instance wordCount> scan -b the
+the count:20080906 []    75
+their count:20080906 []    2
+them count:20080906 []    1
+then count:20080906 []    1
+there count:20080906 []    1
+these count:20080906 []    3
+this count:20080906 []    6
+through count:20080906 []    1
+time count:20080906 []    3
+time. count:20080906 []    1
+to count:20080906 []    27
+total count:20080906 []    1
+tserver, count:20080906 []    1
+tserver.compaction.major.concurrent.max count:20080906 []    1
+...
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/examples/shard.html Tue Nov 1 17:08:17 2011 @@ -0,0 +1,166 @@ + + + + + + Shard Example + + + + + + + + + +
+

Shard Example

+

Accumulo has in iterator called the intersecting iterator which supports querying a term index that is partitioned by +document, or "sharded". This example shows how to use the intersecting iterator through these four programs:

+ +

To run these example programs, create two tables like below.

+
username@instance> createtable shard
+username@instance shard> createtable doc2term
+
+ + +

After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code.

+
$ cd /local/user1/workspace/accumulo/
+$ find src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.shard.Index instance zookeepers shard username password 30
+
+ + +

The following command queries the index to find all files containing 'foo' and 'bar'.

+
$ cd $ACCUMULO_HOME
+$ ./bin/accumulo org.apache.accumulo.examples.shard.Query instance zookeepers shard username password foo bar
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java
+/local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/RowDeleteTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java
+/local/user1/workspace/accumulo/src/server/src/main/java/accumulo/server/test/functional/DeleteEverythingTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java
+/local/user1/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java
+/local/user1/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java
+
+ + +

Inorder to run ContinuousQuery, we need to run Reverse.java to populate doc2term

+
$ ./bin/accumulo org.apache.accumulo.examples.shard.Reverse instance zookeepers shard doc2term username password
+
+ + +

Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds.

+
$ ./bin/accumulo org.apache.accumulo.examples.shard.ContinuousQuery instance zookeepers shard doc2term username password 5
+[public, core, class, binarycomparable, b] 2  0.081
+[wordtodelete, unindexdocument, doctablename, putdelete, insert] 1  0.041
+[import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1  0.049
+[getpackage, testversion, util, version, 55] 1  0.048
+[for, static, println, public, the] 55  0.211
+[sleeptime, wrappingiterator, options, long, utilwaitthread] 1  0.057
+[string, public, long, 0, wait] 12  0.132
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/failure_handling.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/failure_handling.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img1.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img1.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img2.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img2.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img3.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img3.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img4.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img4.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img5.png ============================================================================== Binary file - no diff available. Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/img5.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/index.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/index.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/index.html Tue Nov 1 17:08:17 2011 @@ -0,0 +1,134 @@ + + + + + + Accumulo User Manual: index + + + + + + + + + +
+

Accumulo User Manual: index

+

Next: Contents Contents
+

+

Version 1.3

+
+

+ +
+
+ + + + +