Subject [3/3] git commit: Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT
Date Thu, 05 Dec 2013 16:57:10 GMT
Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT


Branch: refs/heads/1.6.0-SNAPSHOT
Commit: 1bddc574086129aca3484a6070aee257c8622085
Parents: 0d49819 00fb08b
Author: Christopher Tubbs <>
Authored: Thu Dec 5 11:55:58 2013 -0500
Committer: Christopher Tubbs <>
Committed: Thu Dec 5 11:55:58 2013 -0500

 .../apache/accumulo/examples/simple/filedata/ | 2 +-
 .../apache/accumulo/examples/simple/filedata/  | 2 +-
 server/monitor/src/main/resources/docs/examples/README.filedata  | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)
diff --cc server/monitor/src/main/resources/docs/examples/README.filedata
index 946ca8c,0000000..9f0016e
mode 100644,000000..100644
--- a/server/monitor/src/main/resources/docs/examples/README.filedata
+++ b/server/monitor/src/main/resources/docs/examples/README.filedata
@@@ -1,47 -1,0 +1,47 @@@
 +Title: Apache Accumulo File System Archive Example (Data Only)
 +Notice:    Licensed to the Apache Software Foundation (ASF) under one
 +           or more contributor license agreements.  See the NOTICE file
 +           distributed with this work for additional information
 +           regarding copyright ownership.  The ASF licenses this file
 +           to you under the Apache License, Version 2.0 (the
 +           "License"); you may not use this file except in compliance
 +           with the License.  You may obtain a copy of the License at
 +           .
 +           .
 +           Unless required by applicable law or agreed to in writing,
 +           software distributed under the License is distributed on an
 +           KIND, either express or implied.  See the License for the
 +           specific language governing permissions and limitations
 +           under the License.
 +This example archives file data into an Accumulo table.  Files with duplicate data are only
stored once.
 +The example has the following classes:
 + * CharacterHistogram - A MapReduce that computes a histogram of byte frequency for each
file and stores the histogram alongside the file data.  An example use of the ChunkInputFormat.
 + * ChunkCombiner - An Iterator that dedupes file data and sets their visibilities to a combined
visibility based on current references to the file data.
 + * ChunkInputFormat - An Accumulo InputFormat that provides keys containing file info (List<Entry<Key,Value>>)
and values with an InputStream over the file (ChunkInputStream).
 + * ChunkInputStream - An input stream over file data stored in Accumulo.
-  * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on the SHA1
hashes of the files.
-  * FileDataQuery - Retrieves file data based on the SHA1 hash of the file. (Used by the
++ * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on hashes
of the files.
++ * FileDataQuery - Retrieves file data based on the hash of the file. (Used by the dirlist.Viewer.)
 + * KeyUtil - A utility for creating and parsing null-byte separated strings into/from Text
 + * VisibilityCombiner - A utility for merging visibilities into the form (VIS1)|(VIS2)|...
 +This example is coupled with the dirlist example.  See README.dirlist for instructions.
 +If you haven't already run the README.dirlist example, ingest a file with FileDataIngest.
 +    $ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest -i instance
-z zookeepers -u username -p password -t dataTable --auths exampleVis --chunk 1000 $ACCUMULO_HOME/README
 +Open the accumulo shell and look at the data.  The row is the MD5 hash of the file, which
you can verify by running a command such as 'md5sum' on the file.
 +    > scan -t dataTable
 +Run the CharacterHistogram MapReduce to add some information about the file.
 +    $ bin/ lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram
-i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --vis exampleVis
 +Scan again to see the histogram stored in the 'info' column family.
 +    > scan -t dataTable

