Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/constraints.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/constraints.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/constraints.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,140 @@ + + + + + + Apache Accumulo Constraints Example + + + + + + + + + +
+ +
+ +
+

Apache Accumulo Constraints Example

+

This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.constraints in the simple-examples module:

+ +

This an example of how to create a table with constraints. Below a table is +create with two example constraints. One constraints does not allow non alpha +numeric keys. The other constraint does not allow non numeric values. Two +inserts that violate these constraints are attempted and denied. The scan at +the end shows the inserts were not allowed.

+
$ ./bin/accumulo shell -u username -p password
+
+Shell - Apache Accumulo Interactive Shell
+- 
+- version: 1.4.x
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> createtable testConstraints
+username@instance testConstraints> config -t testConstraints -s table.constraint.1=org.apache.accumulo.examples.simple.constraints.NumericValueConstraint
+username@instance testConstraints> config -t testConstraints -s table.constraint.2=org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint
+username@instance testConstraints> insert r1 cf1 cq1 1111
+username@instance testConstraints> insert r1 cf1 cq1 ABC
+  Constraint Failures:
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1)
+username@instance testConstraints> insert r1! cf1 cq1 ABC 
+  Constraint Failures:
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1)
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint, violationCode:1, violationDescription:Row was not alpha numeric, numberOfViolatingMutations:1)
+username@instance testConstraints> scan
+r1 cf1:cq1 []    1111
+username@instance testConstraints>
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/dirlist.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/dirlist.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/dirlist.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,199 @@ + + + + + + Apache Accumulo File System Archive + + + + + + + + + +
+ +
+ +
+

Apache Accumulo File System Archive

+

This example stores filesystem information in accumulo. The example stores the information in the following three tables. More information about the table structures can be found at the end of README.dirlist.

+ +

This example shows how to use Accumulo to store a file system history. It has the following classes:

+ +

To begin, ingest some data with Ingest.java.

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Ingest instance zookeepers username password dirTable indexTable dataTable exampleVis 100000 /local/username/workspace
+
+ + +

This may take some time if there are large files in the /local/username/workspace directory. If you use 0 instead of 100000 on the command line, the ingest will run much faster, but it will not put any file data into Accumulo (the dataTable will be empty). +Note that running this example will create tables dirTable, indexTable, and dataTable in Accumulo that you should delete when you have completed the example. +If you modify a file or add new files in the directory ingested (e.g. /local/username/workspace), you can run Ingest again to add new information into the Accumulo tables.

+

To browse the data ingested, use Viewer.java. Be sure to give the "username" user the authorizations to see the data (in this case, run "setauths -u username -s exampleVis" in the shell, and use the string "exampleVis" as the "auths" in command lines below).

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Viewer instance zookeepers username password dirTable dataTable auths /local/username/workspace
+
+ + +

To list the contents of specific directories, use QueryUtil.java.

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password dirTable auths /local/username
+$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password dirTable auths /local/username/workspace
+
+ + +

To perform searches on file or directory names, also use QueryUtil.java. Search terms must contain no more than one wild card and cannot contain "/". +Note these queries run on the indexTable table instead of the dirTable table.

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password indexTable exampleVis filename -search
+$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password indexTable exampleVis 'filename*' -search
+$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password indexTable exampleVis '*jar' -search
+$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil instance zookeepers username password indexTable exampleVis filename*jar -search
+
+ + +

To count the number of direct children (directories and files) and descendants (children and children's descendants, directories and files), run the FileCount over the dirTable table. +The results are written back to the same table. FileCount reads from and writes to Accumulo. This requires scan authorizations for the read and a visibility for the data written. +In this example, the authorizations and visibility are set to the same value, exampleVis. See README.visibility for more information on visibility and authorizations.

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.FileCount instance zookeepers username password dirTable exampleVis exampleVis
+
+ + +

Directory Table

+

Here is a illustration of what data looks like in the directory table:

+
row colf:colq [vis] value
+000 dir:exec [exampleVis]    true
+000 dir:hidden [exampleVis]    false
+000 dir:lastmod [exampleVis]    1291996886000
+000 dir:length [exampleVis]    1666
+001/local dir:exec [exampleVis]    true
+001/local dir:hidden [exampleVis]    false
+001/local dir:lastmod [exampleVis]    1304945270000
+001/local dir:length [exampleVis]    272
+002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:exec [exampleVis]    false
+002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:hidden [exampleVis]    false
+002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:lastmod [exampleVis]    1308746481000
+002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:length [exampleVis]    9192
+002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:md5 [exampleVis]    274af6419a3c4c4a259260ac7017cbf1
+
+ + +

The rows are of the form depth + path, where depth is the number of slashes ("/") in the path padded to 3 digits. This is so that all the children of a directory appear as consecutive keys in Accumulo; without the depth, you would for example see all the subdirectories of /local before you saw /usr. +For directories the column family is "dir". For files the column family is Long.MAX_VALUE - lastModified in bytes rather than string format so that newer versions sort earlier.

+

Index Table

+

Here is an illustration of what data looks like in the index table:

+
row colf:colq [vis]
+fAccumulo.README i:002/local/Accumulo.README [exampleVis]
+flocal i:001/local [exampleVis]
+rEMDAER.olumuccA i:002/local/Accumulo.README [exampleVis]
+rlacol i:001/local [exampleVis]
+
+ + +

The values of the index table are null. The rows are of the form "f" + filename or "r" + reverse file name. This is to enable searches with wildcards at the beginning, middle, or end.

+

Data Table

+

Here is an illustration of what data looks like in the data table:

+
row colf:colq [vis] value
+274af6419a3c4c4a259260ac7017cbf1 refs:e77276a2b56e5c15b540eaae32b12c69\x00filext [exampleVis]    README
+274af6419a3c4c4a259260ac7017cbf1 refs:e77276a2b56e5c15b540eaae32b12c69\x00name [exampleVis]    /local/Accumulo.README
+274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x00 [exampleVis]    *******************************************************************************\x0A1. Building\x0A\x0AIn the normal tarball or RPM release of accumulo, [truncated]
+274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x01 [exampleVis]
+
+ + +

The rows are the md5 hash of the file. Some column family : column qualifier pairs are "refs" : hash of file name + null byte + property name, in which case the value is property value. There can be multiple references to the same file which are distinguished by the hash of the file name. +Other column family : column qualifier pairs are "~chunk" : chunk size in bytes + chunk number in bytes, in which case the value is the bytes for that chunk of the file. There is an end of file data marker whose chunk number is the number of chunks for the file and whose value is empty.

+

There may exist multiple copies of the same file (with the same md5 hash) with different chunk sizes or different visibilities. There is an iterator that can be set on the data table that combines these copies into a single copy with a visibility taken from the visibilities of the file references, e.g. (vis from ref1)|(vis from ref2).

+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filedata.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filedata.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filedata.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,136 @@ + + + + + + Apache Accumulo File System Archive Example (Data Only) + + + + + + + + + +
+ +
+ +
+

Apache Accumulo File System Archive Example (Data Only)

+

This example archives file data into an Accumulo table. Files with duplicate data are only stored once. +The example has the following classes:

+ +

This example is coupled with the dirlist example. See README.dirlist for instructions.

+

If you haven't already run the README.dirlist example, ingest a file with FileDataIngest.

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest instance zookeepers username password dataTable exampleVis 1000 $ACCUMULO_HOME/README
+
+ + +

Open the accumulo shell and look at the data. The row is the MD5 hash of the file, which you can verify by running a command such as 'md5sum' on the file.

+
> scan -t dataTable
+
+ + +

Run the CharacterHistogram MapReduce to add some information about the file.

+
$ bin/tool.sh lib/examples-simple*[^c].jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram instance zookeepers username password dataTable exampleVis exampleVis
+
+ + +

Scan again to see the histogram stored in the 'info' column family.

+
> scan -t dataTable
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filter.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filter.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/filter.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,197 @@ + + + + + + Apache Accumulo Filter Example + + + + + + + + + +
+ +
+ +
+

Apache Accumulo Filter Example

+

This is a simple filter example. It uses the AgeOffFilter that is provided as +part of the core package org.apache.accumulo.core.iterators.user. Filters are +iterators that select desired key/value pairs (or weed out undesired ones).
+Filters extend the org.apache.accumulo.core.iterators.Filter class +and must implement a method accept(Key k, Value v). This method returns true +if the key/value pair are to be delivered and false if they are to be ignored. +Filter takes a "negate" parameter which defaults to false. If set to true, the +return value of the accept method is negated, so that key/value pairs accepted +by the method are omitted by the Filter.

+
username@instance> createtable filtertest
+username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -ageoff
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: 
+----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: 
+username@instance filtertest> scan
+username@instance filtertest> insert foo a b c
+username@instance filtertest> scan
+foo a:b []    c
+username@instance filtertest>
+
+ + +

... wait 30 seconds ...

+
username@instance filtertest> scan
+username@instance filtertest>
+
+ + +

Note the absence of the entry inserted more than 30 seconds ago. Since the +scope was set to "scan", this means the entry is still in Accumulo, but is +being filtered out at query time. To delete entries from Accumulo based on +the ages of their timestamps, AgeOffFilters should be set up for the "minc" +and "majc" scopes, as well.

+

To force an ageoff of the persisted data, after setting up the ageoff iterator +on the "minc" and "majc" scopes you can flush and compact your table. This will +happen automatically as a background operation on any table that is being +actively written to, but can also be requested in the shell.

+

The first setiter command used the special -ageoff flag to specify the +AgeOffFilter, but any Filter can be configured by using the -class flag. The +following commands show how to enable the AgeOffFilter for the minc and majc +scopes using the -class flag, then flush and compact the table.

+
username@instance filtertest> setiter -t filtertest -minc -majc -p 10 -n myfilter -class org.apache.accumulo.core.iterators.user.AgeOffFilter
+AgeOffFilter removes entries with timestamps more than <ttl> milliseconds old
+----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: 
+----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000
+----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: 
+username@instance filtertest> flush
+06 10:42:24,806 [shell.Shell] INFO : Flush of table filtertest initiated...
+username@instance filtertest> compact
+06 10:42:36,781 [shell.Shell] INFO : Compaction of table filtertest started for given range
+username@instance filtertest> flush -t filtertest -w
+06 10:42:52,881 [shell.Shell] INFO : Flush of table filtertest completed.
+username@instance filtertest> compact -t filtertest -w
+06 10:43:00,632 [shell.Shell] INFO : Compacting table ...
+06 10:43:01,307 [shell.Shell] INFO : Compaction of table filtertest completed for given range
+username@instance filtertest>
+
+ + +

By default, flush and compact execute in the background, but with the -w flag +they will wait to return until the operation has completed. Both are +demonstrated above, though only one call to each would be necessary. A +specific table can be specified with -t.

+

After the compaction runs, the newly created files will not contain any data +that should have been aged off, and the Accumulo garbage collector will remove +the old files.

+

To see the iterator settings for a table, use config.

+
username@instance filtertest> config -t filtertest -f iterator
+---------+---------------------------------------------+---------------------------------------------------------------------------
+SCOPE    | NAME                                        | VALUE
+---------+---------------------------------------------+---------------------------------------------------------------------------
+table    | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.majc.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
+table    | table.iterator.majc.vers.opt.maxVersions .. | 1
+table    | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.minc.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
+table    | table.iterator.minc.vers.opt.maxVersions .. | 1
+table    | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter
+table    | table.iterator.scan.myfilter.opt.ttl ...... | 30000
+table    | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator
+table    | table.iterator.scan.vers.opt.maxVersions .. | 1
+---------+---------------------------------------------+---------------------------------------------------------------------------
+username@instance filtertest>
+
+ + +

When setting new iterators, make sure to order their priority numbers +(specified with -p) in the order you would like the iterators to be applied. +Also, each iterator must have a unique name and priority within each scope.

+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/helloworld.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/helloworld.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/helloworld.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,145 @@ + + + + + + Apache Accumulo Hello World Example + + + + + + + + + +
+ +
+ +
+

Apache Accumulo Hello World Example

+

This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.helloworld in the simple-examples module:

+ +

Log into the accumulo shell:

+
$ ./bin/accumulo shell -u username -p password
+
+ + +

Create a table called 'hellotable':

+
username@instance> createtable hellotable
+
+ + +

Launch a Java program that inserts data with a BatchWriter:

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter instance zookeepers username password hellotable
+
+ + +

Alternatively, the same data can be inserted using MapReduce writers:

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.InsertWithOutputFormat instance zookeepers username password hellotable
+
+ + +

On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries

+
http://master:50095/
+
+ + +

To view the entries, use the shell to scan the table:

+
username@instance> table hellotable
+username@instance hellotable> scan
+
+ + +

You can also use a Java class to scan the table:

+
$ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData instance zookeepers username password hellotable row_0 row_1001
+
+
+ + + + + Added: websites/staging/accumulo/trunk/content/accumulo/1.4/examples/index.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/1.4/examples/index.html (added) +++ websites/staging/accumulo/trunk/content/accumulo/1.4/examples/index.html Fri Mar 23 19:08:12 2012 @@ -0,0 +1,135 @@ + + + + + + Apache Accumulo Examples + + + + + + + + + +
+ +
+ +
+

Apache Accumulo Examples

+

Each README in the examples directory highlights the use of particular features of Apache Accumulo.

+

Before running any of the examples, the following steps must be performed.

+
    +
  1. +

    Install and run Accumulo via the instructions found in $ACCUMULO_HOME/README. +Remember the instance name. It will be referred to as "instance" throughout the examples. +A comma-separated list of zookeeper servers will be referred to as "zookeepers".

    +
  2. +
  3. +

    Create an Accumulo user (see the user manual), or use the root user. +The Accumulo user name will be referred to as "username" with password "password" throughout the examples. +This user will need to have the ability to create tables.

    +
  4. +
+

In all commands, you will need to replace "instance", "zookeepers", "username", and "password" with the values you set for your Accumulo instance.

+

Commands intended to be run in bash are prefixed by '$'. These are always assumed to be run from the $ACCUMULO_HOME directory.

+

Commands intended to be run in the Accumulo shell are prefixed by '>'.

+

batch

+

bloom

+

bulkIngest

+

combiner

+

constraints

+

dirlist

+

filedata

+

filter

+

helloworld

+

isolation

+

mapred

+

shard

+

visibility

+
+ + + + +