Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 93FE510CD7 for ; Tue, 24 Sep 2013 23:01:09 +0000 (UTC) Received: (qmail 80531 invoked by uid 500); 24 Sep 2013 23:00:38 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 79796 invoked by uid 500); 24 Sep 2013 23:00:23 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 79528 invoked by uid 99); 24 Sep 2013 23:00:18 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 24 Sep 2013 23:00:18 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id B040F90926B; Tue, 24 Sep 2013 23:00:17 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: ctubbsii@apache.org To: commits@accumulo.apache.org Date: Tue, 24 Sep 2013 23:00:18 -0000 Message-Id: <99e754eb9c5742f099ac109c252874d1@git.apache.org> In-Reply-To: References: X-Mailer: ASF-Git Admin Mailer Subject: [2/5] ACCUMULO-1490 Move html and config docs to monitor http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.combiner ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.combiner b/server/src/main/resources/docs/examples/README.combiner new file mode 100644 index 0000000..d1ba6e9 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.combiner @@ -0,0 +1,70 @@ +Title: Apache Accumulo Combiner Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This tutorial uses the following Java class, which can be found in org.apache.accumulo.examples.simple.combiner in the examples-simple module: + + * StatsCombiner.java - a combiner that calculates max, min, sum, and count + +This is a simple combiner example. To build this example run maven and then +copy the produced jar into the accumulo lib dir. This is already done in the +tar distribution. + + $ bin/accumulo shell -u username + Enter current password for 'username'@'instance': *** + + Shell - Apache Accumulo Interactive Shell + - + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable runners + username@instance runners> setiter -t runners -p 10 -scan -minc -majc -n decStats -class org.apache.accumulo.examples.simple.combiner.StatsCombiner + Combiner that keeps track of min, max, sum, and count + ----------> set StatsCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: + ----------> set StatsCombiner parameter columns, [:]{,[:]} escape non aplhanum chars using %.: stat + ----------> set StatsCombiner parameter radix, radix/base of the numbers: 10 + username@instance runners> setiter -t runners -p 11 -scan -minc -majc -n hexStats -class org.apache.accumulo.examples.simple.combiner.StatsCombiner + Combiner that keeps track of min, max, sum, and count + ----------> set StatsCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: + ----------> set StatsCombiner parameter columns, [:]{,[:]} escape non aplhanum chars using %.: hstat + ----------> set StatsCombiner parameter radix, radix/base of the numbers: 16 + username@instance runners> insert 123456 name first Joe + username@instance runners> insert 123456 stat marathon 240 + username@instance runners> scan + 123456 name:first [] Joe + 123456 stat:marathon [] 240,240,240,1 + username@instance runners> insert 123456 stat marathon 230 + username@instance runners> insert 123456 stat marathon 220 + username@instance runners> scan + 123456 name:first [] Joe + 123456 stat:marathon [] 220,240,690,3 + username@instance runners> insert 123456 hstat virtualMarathon 6a + username@instance runners> insert 123456 hstat virtualMarathon 6b + username@instance runners> scan + 123456 hstat:virtualMarathon [] 6a,6b,d5,2 + 123456 name:first [] Joe + 123456 stat:marathon [] 220,240,690,3 + +In this example a table is created and the example stats combiner is applied to +the column family stat and hstat. The stats combiner computes min,max,sum, and +count. It can be configured to use a different base or radix. In the example +above the column family stat is configured for base 10 and the column family +hstat is configured for base 16. http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.constraints ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.constraints b/server/src/main/resources/docs/examples/README.constraints new file mode 100644 index 0000000..4a73f45 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.constraints @@ -0,0 +1,54 @@ +Title: Apache Accumulo Constraints Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.constraints in the examples-simple module: + + * AlphaNumKeyConstraint.java - a constraint that requires alphanumeric keys + * NumericValueConstraint.java - a constraint that requires numeric string values + +This an example of how to create a table with constraints. Below a table is +created with two example constraints. One constraints does not allow non alpha +numeric keys. The other constraint does not allow non numeric values. Two +inserts that violate these constraints are attempted and denied. The scan at +the end shows the inserts were not allowed. + + $ ./bin/accumulo shell -u username -p password + + Shell - Apache Accumulo Interactive Shell + - + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable testConstraints + username@instance testConstraints> constraint -a org.apache.accumulo.examples.simple.constraints.NumericValueConstraint + username@instance testConstraints> constraint -a org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint + username@instance testConstraints> insert r1 cf1 cq1 1111 + username@instance testConstraints> insert r1 cf1 cq1 ABC + Constraint Failures: + ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1) + username@instance testConstraints> insert r1! cf1 cq1 ABC + Constraint Failures: + ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1) + ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.AlphaNumKeyConstraint, violationCode:1, violationDescription:Row was not alpha numeric, numberOfViolatingMutations:1) + username@instance testConstraints> scan + r1 cf1:cq1 [] 1111 + username@instance testConstraints> + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.dirlist ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.dirlist b/server/src/main/resources/docs/examples/README.dirlist new file mode 100644 index 0000000..60d3233 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.dirlist @@ -0,0 +1,114 @@ +Title: Apache Accumulo File System Archive +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example stores filesystem information in accumulo. The example stores the information in the following three tables. More information about the table structures can be found at the end of README.dirlist. + + * directory table : This table stores information about the filesystem directory structure. + * index table : This table stores a file name index. It can be used to quickly find files with given name, suffix, or prefix. + * data table : This table stores the file data. File with duplicate data are only stored once. + +This example shows how to use Accumulo to store a file system history. It has the following classes: + + * Ingest.java - Recursively lists the files and directories under a given path, ingests their names and file info into one Accumulo table, indexes the file names in a separate table, and the file data into a third table. + * QueryUtil.java - Provides utility methods for getting the info for a file, listing the contents of a directory, and performing single wild card searches on file or directory names. + * Viewer.java - Provides a GUI for browsing the file system information stored in Accumulo. + * FileCount.java - Computes recursive counts over file system information and stores them back into the same Accumulo table. + +To begin, ingest some data with Ingest.java. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Ingest -i instance -z zookeepers -u username -p password --vis exampleVis --chunkSize 100000 /local/username/workspace + +This may take some time if there are large files in the /local/username/workspace directory. If you use 0 instead of 100000 on the command line, the ingest will run much faster, but it will not put any file data into Accumulo (the dataTable will be empty). +Note that running this example will create tables dirTable, indexTable, and dataTable in Accumulo that you should delete when you have completed the example. +If you modify a file or add new files in the directory ingested (e.g. /local/username/workspace), you can run Ingest again to add new information into the Accumulo tables. + +To browse the data ingested, use Viewer.java. Be sure to give the "username" user the authorizations to see the data (in this case, run + + $ ./bin/accumulo shell -u root -e 'setauths -u username -s exampleVis' + +then run the Viewer: + + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.Viewer -i instance -z zookeepers -u username -p password -t dirTable --dataTable dataTable --auths exampleVis --path /local/username/workspace + +To list the contents of specific directories, use QueryUtil.java. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t dirTable --auths exampleVis --path /local/username + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t dirTable --auths exampleVis --path /local/username/workspace + +To perform searches on file or directory names, also use QueryUtil.java. Search terms must contain no more than one wild card and cannot contain "/". +*Note* these queries run on the _indexTable_ table instead of the dirTable table. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t indexTable --auths exampleVis --path filename --search + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t indexTable --auths exampleVis --path 'filename*' --search + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t indexTable --auths exampleVis --path '*jar' --search + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.QueryUtil -i instance -z zookeepers -u username -p password -t indexTable --auths exampleVis iipath 'filename*jar' --search + +To count the number of direct children (directories and files) and descendants (children and children's descendants, directories and files), run the FileCount over the dirTable table. +The results are written back to the same table. FileCount reads from and writes to Accumulo. This requires scan authorizations for the read and a visibility for the data written. +In this example, the authorizations and visibility are set to the same value, exampleVis. See README.visibility for more information on visibility and authorizations. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.dirlist.FileCount instance zookeepers username password dirTable exampleVis exampleVis + +## Directory Table + +Here is a illustration of what data looks like in the directory table: + + row colf:colq [vis] value + 000 dir:exec [exampleVis] true + 000 dir:hidden [exampleVis] false + 000 dir:lastmod [exampleVis] 1291996886000 + 000 dir:length [exampleVis] 1666 + 001/local dir:exec [exampleVis] true + 001/local dir:hidden [exampleVis] false + 001/local dir:lastmod [exampleVis] 1304945270000 + 001/local dir:length [exampleVis] 272 + 002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:exec [exampleVis] false + 002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:hidden [exampleVis] false + 002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:lastmod [exampleVis] 1308746481000 + 002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:length [exampleVis] 9192 + 002/local/Accumulo.README \x7F\xFF\xFE\xCFH\xA1\x82\x97:md5 [exampleVis] 274af6419a3c4c4a259260ac7017cbf1 + +The rows are of the form depth + path, where depth is the number of slashes ("/") in the path padded to 3 digits. This is so that all the children of a directory appear as consecutive keys in Accumulo; without the depth, you would for example see all the subdirectories of /local before you saw /usr. +For directories the column family is "dir". For files the column family is Long.MAX_VALUE - lastModified in bytes rather than string format so that newer versions sort earlier. + +## Index Table + +Here is an illustration of what data looks like in the index table: + + row colf:colq [vis] + fAccumulo.README i:002/local/Accumulo.README [exampleVis] + flocal i:001/local [exampleVis] + rEMDAER.olumuccA i:002/local/Accumulo.README [exampleVis] + rlacol i:001/local [exampleVis] + +The values of the index table are null. The rows are of the form "f" + filename or "r" + reverse file name. This is to enable searches with wildcards at the beginning, middle, or end. + +## Data Table + +Here is an illustration of what data looks like in the data table: + + row colf:colq [vis] value + 274af6419a3c4c4a259260ac7017cbf1 refs:e77276a2b56e5c15b540eaae32b12c69\x00filext [exampleVis] README + 274af6419a3c4c4a259260ac7017cbf1 refs:e77276a2b56e5c15b540eaae32b12c69\x00name [exampleVis] /local/Accumulo.README + 274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x00 [exampleVis] *******************************************************************************\x0A1. Building\x0A\x0AIn the normal tarball or RPM release of accumulo, [truncated] + 274af6419a3c4c4a259260ac7017cbf1 ~chunk:\x00\x0FB@\x00\x00\x00\x01 [exampleVis] + +The rows are the md5 hash of the file. Some column family : column qualifier pairs are "refs" : hash of file name + null byte + property name, in which case the value is property value. There can be multiple references to the same file which are distinguished by the hash of the file name. +Other column family : column qualifier pairs are "~chunk" : chunk size in bytes + chunk number in bytes, in which case the value is the bytes for that chunk of the file. There is an end of file data marker whose chunk number is the number of chunks for the file and whose value is empty. + +There may exist multiple copies of the same file (with the same md5 hash) with different chunk sizes or different visibilities. There is an iterator that can be set on the data table that combines these copies into a single copy with a visibility taken from the visibilities of the file references, e.g. (vis from ref1)|(vis from ref2). http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.export ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.export b/server/src/main/resources/docs/examples/README.export new file mode 100644 index 0000000..d45c202 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.export @@ -0,0 +1,90 @@ +Title: Apache Accumulo Export/Import Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Accumulo provides a mechanism to export and import tables. This README shows +how to use this feature. + +The shell session below shows creating a table, inserting data, and exporting +the table. A table must be offline to export it, and it should remain offline +for the duration of the distcp. An easy way to take a table offline without +interuppting access to it is to clone it and take the clone offline. + + root@test15> createtable table1 + root@test15 table1> insert a cf1 cq1 v1 + root@test15 table1> insert h cf1 cq1 v2 + root@test15 table1> insert z cf1 cq1 v3 + root@test15 table1> insert z cf1 cq2 v4 + root@test15 table1> addsplits -t table1 b r + root@test15 table1> scan + a cf1:cq1 [] v1 + h cf1:cq1 [] v2 + z cf1:cq1 [] v3 + z cf1:cq2 [] v4 + root@test15> config -t table1 -s table.split.threshold=100M + root@test15 table1> clonetable table1 table1_exp + root@test15 table1> offline table1_exp + root@test15 table1> exporttable -t table1_exp /tmp/table1_export + root@test15 table1> quit + +After executing the export command, a few files are created in the hdfs dir. +One of the files is a list of files to distcp as shown below. + + $ hadoop fs -ls /tmp/table1_export + Found 2 items + -rw-r--r-- 3 user supergroup 162 2012-07-25 09:56 /tmp/table1_export/distcp.txt + -rw-r--r-- 3 user supergroup 821 2012-07-25 09:56 /tmp/table1_export/exportMetadata.zip + $ hadoop fs -cat /tmp/table1_export/distcp.txt + hdfs://n1.example.com:6093/accumulo/tables/3/default_tablet/F0000000.rf + hdfs://n1.example.com:6093/tmp/table1_export/exportMetadata.zip + +Before the table can be imported, it must be copied using distcp. After the +discp completed, the cloned table may be deleted. + + $ hadoop distcp -f /tmp/table1_export/distcp.txt /tmp/table1_export_dest + +The Accumulo shell session below shows importing the table and inspecting it. +The data, splits, config, and logical time information for the table were +preserved. + + root@test15> importtable table1_copy /tmp/table1_export_dest + root@test15> table table1_copy + root@test15 table1_copy> scan + a cf1:cq1 [] v1 + h cf1:cq1 [] v2 + z cf1:cq1 [] v3 + z cf1:cq2 [] v4 + root@test15 table1_copy> getsplits -t table1_copy + b + r + root@test15> config -t table1_copy -f split + ---------+--------------------------+------------------------------------------- + SCOPE | NAME | VALUE + ---------+--------------------------+------------------------------------------- + default | table.split.threshold .. | 1G + table | @override ........... | 100M + ---------+--------------------------+------------------------------------------- + root@test15> tables -l + !METADATA => !0 + trace => 1 + table1_copy => 5 + root@test15 table1_copy> scan -t !METADATA -b 5 -c srv:time + 5;b srv:time [] M1343224500467 + 5;r srv:time [] M1343224500467 + 5< srv:time [] M1343224500467 + + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.filedata ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.filedata b/server/src/main/resources/docs/examples/README.filedata new file mode 100644 index 0000000..946ca8c --- /dev/null +++ b/server/src/main/resources/docs/examples/README.filedata @@ -0,0 +1,47 @@ +Title: Apache Accumulo File System Archive Example (Data Only) +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example archives file data into an Accumulo table. Files with duplicate data are only stored once. +The example has the following classes: + + * CharacterHistogram - A MapReduce that computes a histogram of byte frequency for each file and stores the histogram alongside the file data. An example use of the ChunkInputFormat. + * ChunkCombiner - An Iterator that dedupes file data and sets their visibilities to a combined visibility based on current references to the file data. + * ChunkInputFormat - An Accumulo InputFormat that provides keys containing file info (List>) and values with an InputStream over the file (ChunkInputStream). + * ChunkInputStream - An input stream over file data stored in Accumulo. + * FileDataIngest - Takes a list of files and archives them into Accumulo keyed on the SHA1 hashes of the files. + * FileDataQuery - Retrieves file data based on the SHA1 hash of the file. (Used by the dirlist.Viewer.) + * KeyUtil - A utility for creating and parsing null-byte separated strings into/from Text objects. + * VisibilityCombiner - A utility for merging visibilities into the form (VIS1)|(VIS2)|... + +This example is coupled with the dirlist example. See README.dirlist for instructions. + +If you haven't already run the README.dirlist example, ingest a file with FileDataIngest. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.filedata.FileDataIngest -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --chunk 1000 $ACCUMULO_HOME/README + +Open the accumulo shell and look at the data. The row is the MD5 hash of the file, which you can verify by running a command such as 'md5sum' on the file. + + > scan -t dataTable + +Run the CharacterHistogram MapReduce to add some information about the file. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.filedata.CharacterHistogram -i instance -z zookeepers -u username -p password -t dataTable --auths exampleVis --vis exampleVis + +Scan again to see the histogram stored in the 'info' column family. + + > scan -t dataTable http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.filter ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.filter b/server/src/main/resources/docs/examples/README.filter new file mode 100644 index 0000000..a320554 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.filter @@ -0,0 +1,110 @@ +Title: Apache Accumulo Filter Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This is a simple filter example. It uses the AgeOffFilter that is provided as +part of the core package org.apache.accumulo.core.iterators.user. Filters are +iterators that select desired key/value pairs (or weed out undesired ones). +Filters extend the org.apache.accumulo.core.iterators.Filter class +and must implement a method accept(Key k, Value v). This method returns true +if the key/value pair are to be delivered and false if they are to be ignored. +Filter takes a "negate" parameter which defaults to false. If set to true, the +return value of the accept method is negated, so that key/value pairs accepted +by the method are omitted by the Filter. + + username@instance> createtable filtertest + username@instance filtertest> setiter -t filtertest -scan -p 10 -n myfilter -ageoff + AgeOffFilter removes entries with timestamps more than milliseconds old + ----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: + ----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000 + ----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: + username@instance filtertest> scan + username@instance filtertest> insert foo a b c + username@instance filtertest> scan + foo a:b [] c + username@instance filtertest> + +... wait 30 seconds ... + + username@instance filtertest> scan + username@instance filtertest> + +Note the absence of the entry inserted more than 30 seconds ago. Since the +scope was set to "scan", this means the entry is still in Accumulo, but is +being filtered out at query time. To delete entries from Accumulo based on +the ages of their timestamps, AgeOffFilters should be set up for the "minc" +and "majc" scopes, as well. + +To force an ageoff of the persisted data, after setting up the ageoff iterator +on the "minc" and "majc" scopes you can flush and compact your table. This will +happen automatically as a background operation on any table that is being +actively written to, but can also be requested in the shell. + +The first setiter command used the special -ageoff flag to specify the +AgeOffFilter, but any Filter can be configured by using the -class flag. The +following commands show how to enable the AgeOffFilter for the minc and majc +scopes using the -class flag, then flush and compact the table. + + username@instance filtertest> setiter -t filtertest -minc -majc -p 10 -n myfilter -class org.apache.accumulo.core.iterators.user.AgeOffFilter + AgeOffFilter removes entries with timestamps more than milliseconds old + ----------> set AgeOffFilter parameter negate, default false keeps k/v that pass accept method, true rejects k/v that pass accept method: + ----------> set AgeOffFilter parameter ttl, time to live (milliseconds): 30000 + ----------> set AgeOffFilter parameter currentTime, if set, use the given value as the absolute time in milliseconds as the current time of day: + username@instance filtertest> flush + 06 10:42:24,806 [shell.Shell] INFO : Flush of table filtertest initiated... + username@instance filtertest> compact + 06 10:42:36,781 [shell.Shell] INFO : Compaction of table filtertest started for given range + username@instance filtertest> flush -t filtertest -w + 06 10:42:52,881 [shell.Shell] INFO : Flush of table filtertest completed. + username@instance filtertest> compact -t filtertest -w + 06 10:43:00,632 [shell.Shell] INFO : Compacting table ... + 06 10:43:01,307 [shell.Shell] INFO : Compaction of table filtertest completed for given range + username@instance filtertest> + +By default, flush and compact execute in the background, but with the -w flag +they will wait to return until the operation has completed. Both are +demonstrated above, though only one call to each would be necessary. A +specific table can be specified with -t. + +After the compaction runs, the newly created files will not contain any data +that should have been aged off, and the Accumulo garbage collector will remove +the old files. + +To see the iterator settings for a table, use config. + + username@instance filtertest> config -t filtertest -f iterator + ---------+---------------------------------------------+--------------------------------------------------------------------------- + SCOPE | NAME | VALUE + ---------+---------------------------------------------+--------------------------------------------------------------------------- + table | table.iterator.majc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter + table | table.iterator.majc.myfilter.opt.ttl ...... | 30000 + table | table.iterator.majc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator + table | table.iterator.majc.vers.opt.maxVersions .. | 1 + table | table.iterator.minc.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter + table | table.iterator.minc.myfilter.opt.ttl ...... | 30000 + table | table.iterator.minc.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator + table | table.iterator.minc.vers.opt.maxVersions .. | 1 + table | table.iterator.scan.myfilter .............. | 10,org.apache.accumulo.core.iterators.user.AgeOffFilter + table | table.iterator.scan.myfilter.opt.ttl ...... | 30000 + table | table.iterator.scan.vers .................. | 20,org.apache.accumulo.core.iterators.user.VersioningIterator + table | table.iterator.scan.vers.opt.maxVersions .. | 1 + ---------+---------------------------------------------+--------------------------------------------------------------------------- + username@instance filtertest> + +When setting new iterators, make sure to order their priority numbers +(specified with -p) in the order you would like the iterators to be applied. +Also, each iterator must have a unique name and priority within each scope. http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.helloworld ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.helloworld b/server/src/main/resources/docs/examples/README.helloworld new file mode 100644 index 0000000..be95014 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.helloworld @@ -0,0 +1,47 @@ +Title: Apache Accumulo Hello World Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This tutorial uses the following Java classes, which can be found in org.apache.accumulo.examples.simple.helloworld in the examples-simple module: + + * InsertWithBatchWriter.java - Inserts 10K rows (50K entries) into accumulo with each row having 5 entries + * ReadData.java - Reads all data between two rows + +Log into the accumulo shell: + + $ ./bin/accumulo shell -u username -p password + +Create a table called 'hellotable': + + username@instance> createtable hellotable + +Launch a Java program that inserts data with a BatchWriter: + + $ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.InsertWithBatchWriter -i instance -z zookeepers -u username -p password -t hellotable + +On the accumulo status page at the URL below (where 'master' is replaced with the name or IP of your accumulo master), you should see 50K entries + + http://master:50095/ + +To view the entries, use the shell to scan the table: + + username@instance> table hellotable + username@instance hellotable> scan + +You can also use a Java class to scan the table: + + $ ./bin/accumulo org.apache.accumulo.examples.simple.helloworld.ReadData -i instance -z zookeepers -u username -p password -t hellotable --startKey row_0 --endKey row_1001 http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.isolation ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.isolation b/server/src/main/resources/docs/examples/README.isolation new file mode 100644 index 0000000..06d5aeb --- /dev/null +++ b/server/src/main/resources/docs/examples/README.isolation @@ -0,0 +1,50 @@ +Title: Apache Accumulo Isolation Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + + +Accumulo has an isolated scanner that ensures partial changes to rows are not +seen. Isolation is documented in ../docs/isolation.html and the user manual. + +InterferenceTest is a simple example that shows the effects of scanning with +and without isolation. This program starts two threads. One threads +continually upates all of the values in a row to be the same thing, but +different from what it used to be. The other thread continually scans the +table and checks that all values in a row are the same. Without isolation the +scanning thread will sometimes see different values, which is the result of +reading the row at the same time a mutation is changing the row. + +Below, Interference Test is run without isolation enabled for 5000 iterations +and it reports problems. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z zookeepers -u username -p password -t isotest --iterations 5000 + ERROR Columns in row 053 had multiple values [53, 4553] + ERROR Columns in row 061 had multiple values [561, 61] + ERROR Columns in row 070 had multiple values [570, 1070] + ERROR Columns in row 079 had multiple values [1079, 1579] + ERROR Columns in row 088 had multiple values [2588, 1588] + ERROR Columns in row 106 had multiple values [2606, 3106] + ERROR Columns in row 115 had multiple values [4615, 3115] + finished + +Below, Interference Test is run with isolation enabled for 5000 iterations and +it reports no problems. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.isolation.InterferenceTest -i instance -z zookeepers -u username -p password -t isotest --iterations 5000 --isolated + finished + + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.mapred ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.mapred b/server/src/main/resources/docs/examples/README.mapred new file mode 100644 index 0000000..4acd306 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.mapred @@ -0,0 +1,97 @@ +Title: Apache Accumulo MapReduce Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example uses mapreduce and accumulo to compute word counts for a set of +documents. This is accomplished using a map-only mapreduce job and a +accumulo table with combiners. + +To run this example you will need a directory in HDFS containing text files. +The accumulo readme will be used to show how to run this example. + + $ hadoop fs -copyFromLocal $ACCUMULO_HOME/README /user/username/wc/Accumulo.README + $ hadoop fs -ls /user/username/wc + Found 1 items + -rw-r--r-- 2 username supergroup 9359 2009-07-15 17:54 /user/username/wc/Accumulo.README + +The first part of running this example is to create a table with a combiner +for the column family count. + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable wordCount + username@instance wordCount> setiter -class org.apache.accumulo.core.iterators.user.SummingCombiner -p 10 -t wordCount -majc -minc -scan + SummingCombiner interprets Values as Longs and adds them together. A variety of encodings (variable length, fixed length, or string) are available + ----------> set SummingCombiner parameter all, set to true to apply Combiner to every column, otherwise leave blank. if true, columns option will be ignored.: false + ----------> set SummingCombiner parameter columns, [:]{,[:]} escape non-alphanum chars using %.: count + ----------> set SummingCombiner parameter lossy, if true, failed decodes are ignored. Otherwise combiner will error on failed decodes (default false): : false + ----------> set SummingCombiner parameter type, : STRING + username@instance wordCount> quit + +After creating the table, run the word count map reduce job. + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.WordCount -i instance -z zookeepers --input /user/username/wc wordCount -u username -p password + + 11/02/07 18:20:11 INFO input.FileInputFormat: Total input paths to process : 1 + 11/02/07 18:20:12 INFO mapred.JobClient: Running job: job_201102071740_0003 + 11/02/07 18:20:13 INFO mapred.JobClient: map 0% reduce 0% + 11/02/07 18:20:20 INFO mapred.JobClient: map 100% reduce 0% + 11/02/07 18:20:22 INFO mapred.JobClient: Job complete: job_201102071740_0003 + 11/02/07 18:20:22 INFO mapred.JobClient: Counters: 6 + 11/02/07 18:20:22 INFO mapred.JobClient: Job Counters + 11/02/07 18:20:22 INFO mapred.JobClient: Launched map tasks=1 + 11/02/07 18:20:22 INFO mapred.JobClient: Data-local map tasks=1 + 11/02/07 18:20:22 INFO mapred.JobClient: FileSystemCounters + 11/02/07 18:20:22 INFO mapred.JobClient: HDFS_BYTES_READ=10487 + 11/02/07 18:20:22 INFO mapred.JobClient: Map-Reduce Framework + 11/02/07 18:20:22 INFO mapred.JobClient: Map input records=255 + 11/02/07 18:20:22 INFO mapred.JobClient: Spilled Records=0 + 11/02/07 18:20:22 INFO mapred.JobClient: Map output records=1452 + +After the map reduce job completes, query the accumulo table to see word +counts. + + $ ./bin/accumulo shell -u username -p password + username@instance> table wordCount + username@instance wordCount> scan -b the + the count:20080906 [] 75 + their count:20080906 [] 2 + them count:20080906 [] 1 + then count:20080906 [] 1 + there count:20080906 [] 1 + these count:20080906 [] 3 + this count:20080906 [] 6 + through count:20080906 [] 1 + time count:20080906 [] 3 + time. count:20080906 [] 1 + to count:20080906 [] 27 + total count:20080906 [] 1 + tserver, count:20080906 [] 1 + tserver.compaction.major.concurrent.max count:20080906 [] 1 + ... + +Another example to look at is +org.apache.accumulo.examples.simple.mapreduce.UniqueColumns. This example +computes the unique set of columns in a table and shows how a map reduce job +can directly read a tables files from HDFS. + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.maxmutation ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.maxmutation b/server/src/main/resources/docs/examples/README.maxmutation new file mode 100644 index 0000000..aa679a8 --- /dev/null +++ b/server/src/main/resources/docs/examples/README.maxmutation @@ -0,0 +1,47 @@ +Title: Apache Accumulo MaxMutation Constraints Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This an example of how to limit the size of mutations that will be accepted into +a table. Under the default configuration, accumulo does not provide a limitation +on the size of mutations that can be ingested. Poorly behaved writers might +inadvertently create mutations so large, that they cause the tablet servers to +run out of memory. A simple contraint can be added to a table to reject very +large mutations. + + $ ./bin/accumulo shell -u username -p password + + Shell - Apache Accumulo Interactive Shell + - + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable test_ingest + username@instance test_ingest> config -t test_ingest -s table.constraint.1=org.apache.accumulo.examples.simple.constraints.MaxMutationSize + username@instance test_ingest> + + +Now the table will reject any mutation that is larger than 1/256th of the +working memory of the tablet server. The following command attempts to ingest +a single row with 10000 columns, which exceeds the memory limit: + + $ ./bin/accumulo org.apache.accumulo.test.TestIngest -i instance -z zookeepers -u username -p password --rows 1 --cols 10000 +ERROR : Constraint violates : ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.simple.constraints.MaxMutationSize, violationCode:0, violationDescription:mutation exceeded maximum size of 188160, numberOfViolatingMutations:1) + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.regex ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.regex b/server/src/main/resources/docs/examples/README.regex new file mode 100644 index 0000000..f23190f --- /dev/null +++ b/server/src/main/resources/docs/examples/README.regex @@ -0,0 +1,58 @@ +Title: Apache Accumulo Regex Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example uses mapreduce and accumulo to find items using regular expressions. +This is accomplished using a map-only mapreduce job and a scan-time iterator. + +To run this example you will need some data in a table. The following will +put a trivial amount of data into accumulo using the accumulo shell: + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable input + username@instance> insert dogrow dogcf dogcq dogvalue + username@instance> insert catrow catcf catcq catvalue + username@instance> quit + +The RegexExample class sets an iterator on the scanner. This does pattern matching +against each key/value in accumulo, and only returns matching items. It will do this +in parallel and will store the results in files in hdfs. + +The following will search for any rows in the input table that starts with "dog": + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.RegexExample -u user -p passwd -i instance -t input --rowRegex 'dog.*' --output /tmp/output + + $ hadoop fs -ls /tmp/output + Found 3 items + -rw-r--r-- 1 username supergroup 0 2013-01-10 14:11 /tmp/output/_SUCCESS + drwxr-xr-x - username supergroup 0 2013-01-10 14:10 /tmp/output/_logs + -rw-r--r-- 1 username supergroup 51 2013-01-10 14:10 /tmp/output/part-m-00000 + +We can see the output of our little map-reduce job: + + $ hadoop fs -text /tmp/output/output/part-m-00000 + dogrow dogcf:dogcq [] 1357844987994 false dogvalue + $ + + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.rowhash ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.rowhash b/server/src/main/resources/docs/examples/README.rowhash new file mode 100644 index 0000000..e7fbfed --- /dev/null +++ b/server/src/main/resources/docs/examples/README.rowhash @@ -0,0 +1,59 @@ +Title: Apache Accumulo RowHash Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example shows a simple map/reduce job that reads from an accumulo table and +writes back into that table. + +To run this example you will need some data in a table. The following will +put a trivial amount of data into accumulo using the accumulo shell: + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable input + username@instance> insert a-row cf cq value + username@instance> insert b-row cf cq value + username@instance> quit + +The RowHash class will insert a hash for each row in the database if it contains a +specified colum. Here's how you run the map/reduce job + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.RowHash -u user -p passwd -i instance -t input --column cf:cq + +Now we can scan the table and see the hashes: + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> scan -t input + a-row cf:cq [] value + a-row cf-HASHTYPE:cq-MD5BASE64 [] IGPBYI1uC6+AJJxC4r5YBA== + b-row cf:cq [] value + b-row cf-HASHTYPE:cq-MD5BASE64 [] IGPBYI1uC6+AJJxC4r5YBA== + username@instance> + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.shard ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.shard b/server/src/main/resources/docs/examples/README.shard new file mode 100644 index 0000000..f79015a --- /dev/null +++ b/server/src/main/resources/docs/examples/README.shard @@ -0,0 +1,67 @@ +Title: Apache Accumulo Shard Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +Accumulo has an iterator called the intersecting iterator which supports querying a term index that is partitioned by +document, or "sharded". This example shows how to use the intersecting iterator through these four programs: + + * Index.java - Indexes a set of text files into an Accumulo table + * Query.java - Finds documents containing a given set of terms. + * Reverse.java - Reads the index table and writes a map of documents to terms into another table. + * ContinuousQuery.java Uses the table populated by Reverse.java to select N random terms per document. Then it continuously and randomly queries those terms. + +To run these example programs, create two tables like below. + + username@instance> createtable shard + username@instance shard> createtable doc2term + +After creating the tables, index some files. The following command indexes all of the java files in the Accumulo source code. + + $ cd /local/username/workspace/accumulo/ + $ find core/src server/src -name "*.java" | xargs ./bin/accumulo org.apache.accumulo.examples.simple.shard.Index -i instance -z zookeepers -t shard -u username -p password --partitions 30 + +The following command queries the index to find all files containing 'foo' and 'bar'. + + $ cd $ACCUMULO_HOME + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Query -i instance -z zookeepers -t shard -u username -p password foo bar + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/ColumnVisibilityTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/client/mock/MockConnectorTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/security/VisibilityEvaluatorTest.java + /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/RowDeleteTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/logger/TestLogWriter.java + /local/username/workspace/accumulo/src/server/src/main/java/accumulo/test/functional/DeleteEverythingTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/data/KeyExtentTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/constraints/MetadataConstraintsTest.java + /local/username/workspace/accumulo/src/core/src/test/java/accumulo/core/iterators/WholeRowIteratorTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/util/DefaultMapTest.java + /local/username/workspace/accumulo/src/server/src/test/java/accumulo/server/tabletserver/InMemoryMapTest.java + +In order to run ContinuousQuery, we need to run Reverse.java to populate doc2term. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.Reverse -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password + +Below ContinuousQuery is run using 5 terms. So it selects 5 random terms from each document, then it continually +randomly selects one set of 5 terms and queries. It prints the number of matching documents and the time in seconds. + + $ ./bin/accumulo org.apache.accumulo.examples.simple.shard.ContinuousQuery -i instance -z zookeepers --shardTable shard --doc2Term doc2term -u username -p password --terms 5 + [public, core, class, binarycomparable, b] 2 0.081 + [wordtodelete, unindexdocument, doctablename, putdelete, insert] 1 0.041 + [import, columnvisibilityinterpreterfactory, illegalstateexception, cv, columnvisibility] 1 0.049 + [getpackage, testversion, util, version, 55] 1 0.048 + [for, static, println, public, the] 55 0.211 + [sleeptime, wrappingiterator, options, long, utilwaitthread] 1 0.057 + [string, public, long, 0, wait] 12 0.132 http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.tabletofile ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.tabletofile b/server/src/main/resources/docs/examples/README.tabletofile new file mode 100644 index 0000000..8a4180e --- /dev/null +++ b/server/src/main/resources/docs/examples/README.tabletofile @@ -0,0 +1,59 @@ +Title: Apache Accumulo Table-to-File Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example uses mapreduce to extract specified columns from an existing table. + +To run this example you will need some data in a table. The following will +put a trivial amount of data into accumulo using the accumulo shell: + + $ ./bin/accumulo shell -u username -p password + Shell - Apache Accumulo Interactive Shell + - version: 1.5.0 + - instance name: instance + - instance id: 00000000-0000-0000-0000-000000000000 + - + - type 'help' for a list of available commands + - + username@instance> createtable input + username@instance> insert dog cf cq dogvalue + username@instance> insert cat cf cq catvalue + username@instance> insert junk family qualifier junkvalue + username@instance> quit + +The TableToFile class configures a map-only job to read the specified columns and +write the key/value pairs to a file in HDFS. + +The following will extract the rows containing the column "cf:cq": + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TableToFile -u user -p passwd -i instance -t input --columns cf:cq --output /tmp/output + + $ hadoop fs -ls /tmp/output + -rw-r--r-- 1 username supergroup 0 2013-01-10 14:44 /tmp/output/_SUCCESS + drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs + drwxr-xr-x - username supergroup 0 2013-01-10 14:44 /tmp/output/_logs/history + -rw-r--r-- 1 username supergroup 9049 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_1357847072863_username_TableToFile%5F1357847071434 + -rw-r--r-- 1 username supergroup 26172 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_conf.xml + -rw-r--r-- 1 username supergroup 50 2013-01-10 14:44 /tmp/output/part-m-00000 + +We can see the output of our little map-reduce job: + + $ hadoop fs -text /tmp/output/output/part-m-00000 + catrow cf:cq [] catvalue + dogrow cf:cq [] dogvalue + $ + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/examples/README.terasort ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/examples/README.terasort b/server/src/main/resources/docs/examples/README.terasort new file mode 100644 index 0000000..cf5051a --- /dev/null +++ b/server/src/main/resources/docs/examples/README.terasort @@ -0,0 +1,50 @@ +Title: Apache Accumulo Terasort Example +Notice: Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + . + http://www.apache.org/licenses/LICENSE-2.0 + . + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. + +This example uses map/reduce to generate random input data that will +be sorted by storing it into accumulo. It uses data very similar to the +hadoop terasort benchmark. + +To run this example you run it with arguments describing the amount of data: + + $ bin/tool.sh lib/accumulo-examples-simple.jar org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest \ + -i instance -z zookeepers -u user -p password \ + --count 10 \ + --minKeySize 10 \ + --maxKeySize 10 \ + --minValueSize 78 \ + --maxValueSize 78 \ + --table sort \ + --splits 10 \ + +After the map reduce job completes, scan the data: + + $ ./bin/accumulo shell -u username -p password + username@instance> scan -t sort + +l-$$OE/ZH c: 4 [] GGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOO + ,C)wDw//u= c: 10 [] CCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKK + 75@~?'WdUF c: 1 [] IIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQ + ;L+!2rT~hd c: 8 [] MMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUU + LsS8)|.ZLD c: 5 [] OOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWW + M^*dDE;6^< c: 9 [] UUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCC + ^Eu) createuser username + Enter new password for 'username': ******** + Please confirm new password for 'username': ******** + root@instance> user username + Enter password for user username: ******** + username@instance> createtable vistest + 06 10:48:47,931 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action + username@instance> userpermissions + System permissions: + + Table permissions (!METADATA): Table.READ + username@instance> + +A user does not by default have permission to create a table. + +## Granting permissions to a user + + username@instance> user root + Enter password for user root: ******** + root@instance> grant -s System.CREATE_TABLE -u username + root@instance> user username + Enter password for user username: ******** + username@instance> createtable vistest + username@instance> userpermissions + System permissions: System.CREATE_TABLE + + Table permissions (!METADATA): Table.READ + Table permissions (vistest): Table.READ, Table.WRITE, Table.BULK_IMPORT, Table.ALTER_TABLE, Table.GRANT, Table.DROP_TABLE + username@instance vistest> + +## Inserting data with visibilities + +Visibilities are boolean AND (&) and OR (|) combinations of authorization +tokens. Authorization tokens are arbitrary strings taken from a restricted +ASCII character set. Parentheses are required to specify order of operations +in visibilities. + + username@instance vistest> insert row f1 q1 v1 -l A + username@instance vistest> insert row f2 q2 v2 -l A&B + username@instance vistest> insert row f3 q3 v3 -l apple&carrot|broccoli|spinach + 06 11:19:01,432 [shell.Shell] ERROR: org.apache.accumulo.core.util.BadArgumentException: cannot mix | and & near index 12 + apple&carrot|broccoli|spinach + ^ + username@instance vistest> insert row f3 q3 v3 -l (apple&carrot)|broccoli|spinach + username@instance vistest> + +## Scanning with authorizations + +Authorizations are sets of authorization tokens. Each Accumulo user has +authorizations and each Accumulo scan has authorizations. Scan authorizations +are only allowed to be a subset of the user's authorizations. By default, a +user's authorizations set is empty. + + username@instance vistest> scan + username@instance vistest> scan -s A + 06 11:43:14,951 [shell.Shell] ERROR: java.lang.RuntimeException: org.apache.accumulo.core.client.AccumuloSecurityException: Error BAD_AUTHORIZATIONS - The user does not have the specified authorizations assigned + username@instance vistest> + +## Setting authorizations for a user + + username@instance vistest> setauths -s A + 06 11:53:42,056 [shell.Shell] ERROR: org.apache.accumulo.core.client.AccumuloSecurityException: Error PERMISSION_DENIED - User does not have permission to perform this action + username@instance vistest> + +A user cannot set authorizations unless the user has the System.ALTER_USER permission. +The root user has this permission. + + username@instance vistest> user root + Enter password for user root: ******** + root@instance vistest> setauths -s A -u username + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> scan -s A + row f1:q1 [A] v1 + username@instance vistest> scan + row f1:q1 [A] v1 + username@instance vistest> + +The default authorizations for a scan are the user's entire set of authorizations. + + username@instance vistest> user root + Enter password for user root: ******** + root@instance vistest> setauths -s A,B,broccoli -u username + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> scan + row f1:q1 [A] v1 + row f2:q2 [A&B] v2 + row f3:q3 [(apple&carrot)|broccoli|spinach] v3 + username@instance vistest> scan -s B + username@instance vistest> + +If you want, you can limit a user to only be able to insert data which they can read themselves. +It can be set with the following constraint. + + username@instance vistest> user root + Enter password for user root: ****** + root@instance vistest> config -t vistest -s table.constraint.1=org.apache.accumulo.core.security.VisibilityConstraint + root@instance vistest> user username + Enter password for user username: ******** + username@instance vistest> insert row f4 q4 v4 -l spinach + Constraint Failures: + ConstraintViolationSummary(constrainClass:org.apache.accumulo.core.security.VisibilityConstraint, violationCode:2, violationDescription:User does not have authorization on column visibility, numberOfViolatingMutations:1) + username@instance vistest> insert row f4 q4 v4 -l spinach|broccoli + username@instance vistest> scan + row f1:q1 [A] v1 + row f2:q2 [A&B] v2 + row f3:q3 [(apple&carrot)|broccoli|spinach] v3 + row f4:q4 [spinach|broccoli] v4 + username@instance vistest> + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/index.html ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/index.html b/server/src/main/resources/docs/index.html new file mode 100644 index 0000000..fa399fb --- /dev/null +++ b/server/src/main/resources/docs/index.html @@ -0,0 +1,41 @@ + + + +Accumulo Documentation + + + + +

Apache Accumulo Documentation

+ + + + http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/isolation.html ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/isolation.html b/server/src/main/resources/docs/isolation.html new file mode 100644 index 0000000..d0e77cc --- /dev/null +++ b/server/src/main/resources/docs/isolation.html @@ -0,0 +1,39 @@ + + + +Accumulo Isolation + + + + +

Apache Accumulo Documentation : Isolation

+ +

Scanning

+ +

Accumulo supports the ability to present an isolated view of rows when scanning. There are three possible ways that a row could change in accumulo : +

    +
  • a mutation applied to a table +
  • iterators executed as part of a minor or major compaction +
  • bulk import of new files +
+Isolation guarantees that either all or none of the changes made by these operations on a row are seen. Use the IsolatedScanner to obtain an isolated view of an accumulo table. When using the regular scanner it is possible to see a non isolated view of a row. For example if a mutation modifies three columns, it is possible that you will only see two of those modifications. With the isolated scanner either all three of the changes are seen or none. For an example of this try running the InterferenceTest example. + +

At this time there is no client side isolation support for the BatchScanner. You may consider using the WholeRowIterator with the BatchScanner to achieve isolation though. This drawback of doing this is that entire rows are read into memory on the server side. If a row is too big, it may crash a tablet server. The IsolatedScanner buffers rows on the client side so a large row will not crash a tablet server. + +

Iterators

+

When writing server side iterators for accumulo isolation is something to be aware of. A scan time iterator in accumulo reads from a set of data sources. While an iterator is reading data it has an isolated view. However, after it returns a key/value it is possible that accumulo may switch data sources and re-seek the iterator. This is done so that resources may be reclaimed. When the user does not request isolation this can occur after any key is returned. When a user request isolation this will only occur after a new row is returned, in which case it will re-seek to the very beginning of the next possible row. http://git-wip-us.apache.org/repos/asf/accumulo/blob/acba59b1/server/src/main/resources/docs/lgroups.html ---------------------------------------------------------------------- diff --git a/server/src/main/resources/docs/lgroups.html b/server/src/main/resources/docs/lgroups.html new file mode 100644 index 0000000..0012ffb --- /dev/null +++ b/server/src/main/resources/docs/lgroups.html @@ -0,0 +1,42 @@ + + + +Accumulo Locality Groups + + + + +

Apache Accumulo Documentation : Locality Groups

+ +

Accumulo supports locality groups similar to those described in the Big Table paper. Locality groups allow vertical partitioning of data by column family. This allows user to configure their tables such that scans over a subset of column families are much faster. The Accumulo locality group model has the following features. + +

    +
  • There is a default locality group that holds all column families not in a declared locality group. +
  • No requirement to declare locality groups or column families at table creation. +
  • Can change locality group configuration on the fly. +
+ + +

When the locality group configuration for a table is changed it has no effect on existing data. All minor and major compactions that occur after the change will organize data into the new locality group structure. As data is written into a table, it will cause minor and major compactions to occur. Over time this will result in all data being organized according to the new locality groups. If all data must be reorganized into the new locality groups immediately, this can be accomplished by forcing a full major compaction of the table. Use the compact command in the shell to accomplish this. + +

There are two ways to manipulate locality groups, via the shell or through the Java API. From the shell use the getgroups and setgroups commands. Through the API, TableOperations has the methods setLocalityGroups() and getLocalityGroups(). + +

To limit scans to a set of locality groups, use the fetchColumnFamily() function on Scanner or BatchScanner. From the shell use scan with the -c option. + + +