Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DB9901088C for ; Thu, 5 Dec 2013 16:57:11 +0000 (UTC) Received: (qmail 221 invoked by uid 500); 5 Dec 2013 16:57:11 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 195 invoked by uid 500); 5 Dec 2013 16:57:10 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 99921 invoked by uid 99); 5 Dec 2013 16:57:08 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Dec 2013 16:57:08 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 3F7C081BCCE; Thu, 5 Dec 2013 16:57:08 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: ctubbsii@apache.org To: commits@accumulo.apache.org Date: Thu, 05 Dec 2013 16:57:10 -0000 Message-Id: <86f7e7636cc94de69dbdcd89f8d6ad5a@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [3/3] git commit: Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT Merge branch '1.5.1-SNAPSHOT' into 1.6.0-SNAPSHOT Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/1bddc574 Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/1bddc574 Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/1bddc574 Branch: refs/heads/1.6.0-SNAPSHOT Commit: 1bddc574086129aca3484a6070aee257c8622085 Parents: 0d49819 00fb08b Author: Christopher Tubbs Authored: Thu Dec 5 11:55:58 2013 -0500 Committer: Christopher Tubbs Committed: Thu Dec 5 11:55:58 2013 -0500 ---------------------------------------------------------------------- .../apache/accumulo/examples/simple/filedata/FileDataIngest.java | 2 +- .../apache/accumulo/examples/simple/filedata/FileDataQuery.java | 2 +- server/monitor/src/main/resources/docs/examples/README.filedata | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/1bddc574/examples/simple/src/main/java/org/apache/accumulo/examples/simple/filedata/FileDataQuery.java ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/1bddc574/server/monitor/src/main/resources/docs/examples/README.filedata ---------------------------------------------------------------------- diff --cc server/monitor/src/main/resources/docs/examples/README.filedata index 946ca8c,0000000..9f0016e mode 100644,000000..100644 --- a/server/monitor/src/main/resources/docs/examples/README.filedata +++ b/server/monitor/src/main/resources/docs/examples/README.filedata @@@ -1,47 -1,0 +1,47 @@@ +Title: Apache Accumulo File System Archive Example (Data Only) +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example archives file data into an Accumulo table. Files with duplicate data are only stored once. +The example has the following classes: + + * CharacterHistogram - A MapReduce that computes a histogram of byte frequency for each file and stores the histogram alongside the file data. An example use of the ChunkInputFormat. + * ChunkCombiner - An Iterator that dedupes file data and sets their visibilities to a combined visibility based on current references to the file data. + * ChunkInputFormat - An Accumulo InputFormat that provides keys containing file info (List>) and values with an InputStream over the file (ChunkInputStream). + * ChunkInputStream - An input stream over file data stored in Accumulo. - * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on the SHA1 hashes of the files. - * FileDataQuery - Retrieves file data based on the SHA1 hash of the file. (Used by the dirlist.Viewer.) ++ * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on hashes of the files. ++ * FileDataQuery - Retrieves file data based on the hash of the file. (Used by the dirlist.Viewer.) + * KeyUtil - A utility for creating and parsing null-byte separated strings into/from Text objects. + * VisibilityCombiner - A utility for merging visibilities into the form (VIS1)|(VIS2)|... + +This example is coupled with the dirlist example. See README.dirlist for instructions. + +If you haven't already run the README.dirlist example, ingest a file with FileDataIngest. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --chunk 1000 $ACCUMULO_HOME/README + +Open the accumulo shell and look at the data. The row is the MD5 hash of the file, which you can verify by running a command such as 'md5sum' on the file. + + > scan -t dataTable + +Run the CharacterHistogram MapReduce to add some information about the file. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --vis exampleVis + +Scan again to see the histogram stored in the 'info' column family. + + > scan -t dataTable