http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/Writing_Accumulo_Clients.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/Writing_Accumulo_Clients.html b/1.3/user_manual/Writing_Accumulo_Clients.html new file mode 100644 index 0000000..79cfabd --- /dev/null +++ b/1.3/user_manual/Writing_Accumulo_Clients.html @@ -0,0 +1,303 @@ + + + + + + + + + + + + +User Manual: Writing Accumulo Clients + + + + + + + + + + + + +
+
+
+ + +
+ +

User Manual: Writing Accumulo Clients

+ +

** Next:** Table Configuration ** Up:** Apache Accumulo User Manual Version 1.3 ** Previous:** Accumulo Shell ** Contents**

+ +

Subsections

+ + + +
+ +

Writing Accumulo Clients

+ +

All clients must first identify the Accumulo instance to which they will be communicating. Code to do this is as follows:

+ +
String instanceName = "myinstance";
+String zooServers = "zooserver-one,zooserver-two"
+Instance inst = new ZooKeeperInstance(instanceName, zooServers);
+
+Connector conn = new Connector(inst, "user","passwd".getBytes());
+
+
+ +

Writing Data

+ +

Data are written to Accumulo by creating Mutation objects that represent all the changes to the columns of a single row. The changes are made atomically in the TabletServer. Clients then add Mutations to a BatchWriter which submits them to the appropriate TabletServers.

+ +

Mutations can be created thus:

+ +
Text rowID = new Text("row1");
+Text colFam = new Text("myColFam");
+Text colQual = new Text("myColQual");
+ColumnVisibility colVis = new ColumnVisibility("public");
+long timestamp = System.currentTimeMillis();
+
+Value value = new Value("myValue".getBytes());
+
+Mutation mutation = new Mutation(rowID);
+mutation.put(colFam, colQual, colVis, timestamp, value);
+
+
+ +

BatchWriter

+ +

The BatchWriter is highly optimized to send Mutations to multiple TabletServers and automatically batches Mutations destined for the same TabletServer to amortize network overhead. Care must be taken to avoid changing the contents of any Object passed to the BatchWriter since it keeps objects in memory while batching.

+ +

Mutations are added to a BatchWriter thus:

+ +
long memBuf = 1000000L; // bytes to store before sending a batch
+long timeout = 1000L; // milliseconds to wait before sending
+int numThreads = 10;
+
+BatchWriter writer =
+    conn.createBatchWriter("table", memBuf, timeout, numThreads)
+
+writer.add(mutation);
+
+writer.close();
+
+
+ +

An example of using the batch writer can be found at
+accumulo/docs/examples/README.batch

+ +

Reading Data

+ +

Accumulo is optimized to quickly retrieve the value associated with a given key, and to efficiently return ranges of consecutive keys and their associated values.

+ +

Scanner

+ +

To retrieve data, Clients use a Scanner, which provides acts like an Iterator over keys and values. Scanners can be configured to start and stop at particular keys, and to return a subset of the columns available.

+ +
// specify which visibilities we are allowed to see
+Authorizations auths = new Authorizations("public");
+
+Scanner scan =
+    conn.createScanner("table", auths);
+
+scan.setRange(new Range("harry","john"));
+scan.fetchFamily("attributes");
+
+for(Entry<Key,Value> entry : scan) {
+    String row = e.getKey().getRow();
+    Value value = e.getValue();
+}
+
+
+ +

BatchScanner

+ +

For some types of access, it is more efficient to retrieve several ranges simultaneously. This arises when accessing a set of rows that are not consecutive whose IDs have been retrieved from a secondary index, for example.

+ +

The BatchScanner is configured similarly to the Scanner; it can be configured to retrieve a subset of the columns available, but rather than passing a single Range, BatchScanners accept a set of Ranges. It is important to note that the keys returned by a BatchScanner are not in sorted order since the keys streamed are from multiple TabletServers in parallel.

+ +
ArrayList<Range> ranges = new ArrayList<Range>();
+// populate list of ranges ...
+
+BatchScanner bscan =
+    conn.createBatchScanner("table", auths, 10);
+
+bscan.setRanges(ranges);
+bscan.fetchFamily("attributes");
+
+for(Entry<Key,Value> entry : scan)
+    System.out.println(e.getValue());
+
+
+ +

An example of the BatchScanner can be found at
+accumulo/docs/examples/README.batch

+ +
+ +

** Next:** Table Configuration ** Up:** Apache Accumulo User Manual Version 1.3 ** Previous:** Accumulo Shell ** Contents**

+ + +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/accumulo_user_manual.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/accumulo_user_manual.html b/1.3/user_manual/accumulo_user_manual.html new file mode 100644 index 0000000..9591e69 --- /dev/null +++ b/1.3/user_manual/accumulo_user_manual.html @@ -0,0 +1,214 @@ + + + + + + + + + + + + +User Manual: index + + + + + + + + + + + + +
+
+
+ + +
+ +

User Manual: index

+ +

** Next:** Contents ** Contents**

+ +

Apache Accumulo User Manual

+

Version 1.3

+ +
+ +

+ + + +
+ + +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/data_distribution.png ---------------------------------------------------------------------- diff --git a/1.3/user_manual/data_distribution.png b/1.3/user_manual/data_distribution.png new file mode 100644 index 0000000..71b585b Binary files /dev/null and b/1.3/user_manual/data_distribution.png differ http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples.html b/1.3/user_manual/examples.html new file mode 100644 index 0000000..d06b631 --- /dev/null +++ b/1.3/user_manual/examples.html @@ -0,0 +1,10 @@ + + + +Redirecting… + + +

Redirecting…

+Click here if you are not redirected. + + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples/aggregation.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/aggregation.html b/1.3/user_manual/examples/aggregation.html new file mode 100644 index 0000000..2e72c14 --- /dev/null +++ b/1.3/user_manual/examples/aggregation.html @@ -0,0 +1,222 @@ + + + + + + + + + + + + +Aggregation Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Aggregation Example

+ +

This is a simple aggregation example. To build this example run maven and then +copy the produced jar into the accumulo lib dir. This is already done in the +tar distribution.

+ +
$ bin/accumulo shell -u username
+Enter current password for 'username'@'instance': ***
+
+Shell - Apache Accumulo Interactive Shell
+- 
+- version: 1.3.x-incubating
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> createtable aggtest1 -a app=org.apache.accumulo.examples.aggregation.SortedSetAggregator
+username@instance aggtest1> insert foo app 1 a
+username@instance aggtest1> insert foo app 1 b
+username@instance aggtest1> scan
+foo app:1 []  a,b
+username@instance aggtest1> insert foo app 1 z,1,foo,w
+username@instance aggtest1> scan
+foo app:1 []  1,a,b,foo,w,z
+username@instance aggtest1> insert foo app 2 cat,dog,muskrat
+username@instance aggtest1> insert foo app 2 mouse,bird
+username@instance aggtest1> scan
+foo app:1 []  1,a,b,foo,w,z
+foo app:2 []  bird,cat,dog,mouse,muskrat
+username@instance aggtest1> 
+
+
+ +

In this example a table is created and the example set aggregator is +applied to the column family app.

+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples/batch.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/batch.html b/1.3/user_manual/examples/batch.html new file mode 100644 index 0000000..3cb8289 --- /dev/null +++ b/1.3/user_manual/examples/batch.html @@ -0,0 +1,227 @@ + + + + + + + + + + + + +Batch Writing and Scanning Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Batch Writing and Scanning Example

+ +

This is an example of how to use the batch writer and batch scanner. To compile +the example, run maven and copy the produced jar into the accumulo lib dir. +This is already done in the tar distribution.

+ +

Below are commands that add 10000 entries to accumulo and then do 100 random +queries. The write command generates random 50 byte values.

+ +

Be sure to use the name of your instance (given as instance here) and the appropriate +list of zookeeper nodes (given as zookeepers here).

+ +

Before you run this, you must ensure that the user you are running has the +“exampleVis” authorization. (you can set this in the shell with “setauths -u username -s exampleVis”)

+ +
$ ./bin/accumulo shell -u root
+> setauths -u username -s exampleVis
+> exit
+
+
+ +

You must also create the table, batchtest1, ahead of time. (In the shell, use “createtable batchtest1”)

+ +
$ ./bin/accumulo shell -u username
+> createtable batchtest1
+> exit
+$ ./bin/accumulo org.apache.accumulo.examples.client.SequentialBatchWriter instance zookeepers username password batchtest1 0 10000 50 20000000 500 20 exampleVis
+$ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchScanner instance zookeepers username password batchtest1 100 0 10000 50 20 exampleVis
+07 11:33:11,103 [client.CountingVerifyingReceiver] INFO : Generating 100 random queries...
+07 11:33:11,112 [client.CountingVerifyingReceiver] INFO : finished
+07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : 694.44 lookups/sec   0.14 secs
+
+07 11:33:11,260 [client.CountingVerifyingReceiver] INFO : num results : 100
+
+07 11:33:11,364 [client.CountingVerifyingReceiver] INFO : Generating 100 random queries...
+07 11:33:11,370 [client.CountingVerifyingReceiver] INFO : finished
+07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : 2173.91 lookups/sec   0.05 secs
+
+07 11:33:11,416 [client.CountingVerifyingReceiver] INFO : num results : 100
+
+
+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples/bloom.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/bloom.html b/1.3/user_manual/examples/bloom.html new file mode 100644 index 0000000..516c1b1 --- /dev/null +++ b/1.3/user_manual/examples/bloom.html @@ -0,0 +1,313 @@ + + + + + + + + + + + + +Bloom Filter Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Bloom Filter Example

+ +

This example shows how to create a table with bloom filters enabled. It also +shows how bloom filters increase query performance when looking for values that +do not exist in a table.

+ +

Below table named bloom_test is created and bloom filters are enabled.

+ +
$ ./accumulo shell -u username -p password
+Shell - Apache Accumulo Interactive Shell
+- version: 1.3.x-incubating
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> setauths -u username -s exampleVis
+username@instance> createtable bloom_test
+username@instance bloom_test> config -t bloom_test -s table.bloom.enabled=true
+username@instance bloom_test> exit
+
+
+ +

Below 1 million random values are inserted into accumulo. The randomly +generated rows range between 0 and 1 billion. The random number generator is +initialized with the seed 7.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchWriter -s 7 instance zookeepers username password bloom_test 1000000 0 1000000000 50 2000000 60000 3 exampleVis
+
+
+ +

Below the table is flushed, look at the monitor page and wait for the flush to +complete.

+ +
$ ./bin/accumulo shell -u username -p password
+username@instance> flush -t bloom_test
+Flush of table bloom_test initiated...
+username@instance> exit
+
+
+ +

The flush will be finished when there are no entries in memory and the +number of minor compactions goes to zero. Refresh the page to see changes to the table.

+ +

After the flush completes, 500 random queries are done against the table. The +same seed is used to generate the queries, therefore everything is found in the +table.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchScanner -s 7 instance zookeepers username password bloom_test 500 0 1000000000 50 20 exampleVis
+Generating 500 random queries...finished
+96.19 lookups/sec   5.20 secs
+num results : 500
+Generating 500 random queries...finished
+102.35 lookups/sec   4.89 secs
+num results : 500
+
+
+ +

Below another 500 queries are performed, using a different seed which results +in nothing being found. In this case the lookups are much faster because of +the bloom filters.

+ +
$ ../bin/accumulo org.apache.accumulo.examples.client.RandomBatchScanner -s 8 instance zookeepers username password bloom_test 500 0 1000000000 50 20 exampleVis
+Generating 500 random queries...finished
+2212.39 lookups/sec   0.23 secs
+num results : 0
+Did not find 500 rows
+Generating 500 random queries...finished
+4464.29 lookups/sec   0.11 secs
+num results : 0
+Did not find 500 rows
+
+
+ +
+ +

Bloom filters can also speed up lookups for entries that exist. In accumulo +data is divided into tablets and each tablet has multiple map files. Every +lookup in accumulo goes to a specific tablet where a lookup is done on each +map file in the tablet. So if a tablet has three map files, lookup performance +can be three times slower than a tablet with one map file. However if the map +files contain unique sets of data, then bloom filters can help eliminate map +files that do not contain the row being looked up. To illustrate this two +identical tables were created using the following process. One table had bloom +filters, the other did not. Also the major compaction ratio was increased to +prevent the files from being compacted into one file.

+ +
    +
  • Insert 1 million entries using RandomBatchWriter with a seed of 7
  • +
  • Flush the table using the shell
  • +
  • Insert 1 million entries using RandomBatchWriter with a seed of 8
  • +
  • Flush the table using the shell
  • +
  • Insert 1 million entries using RandomBatchWriter with a seed of 9
  • +
  • Flush the table using the shell
  • +
+ +

After following the above steps, each table will have a tablet with three map +files. Each map file will contain 1 million entries generated with a different +seed.

+ +

Below 500 lookups are done against the table without bloom filters using random +NG seed 7. Even though only one map file will likely contain entries for this +seed, all map files will be interrogated.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchScanner -s 7 instance zookeepers username password bloom_test1 500 0 1000000000 50 20 exampleVis
+Generating 500 random queries...finished
+35.09 lookups/sec  14.25 secs
+num results : 500
+Generating 500 random queries...finished
+35.33 lookups/sec  14.15 secs
+num results : 500
+
+
+ +

Below the same lookups are done against the table with bloom filters. The +lookups were 2.86 times faster because only one map file was used, even though three +map files existed.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.client.RandomBatchScanner -s 7 instance zookeepers username password bloom_test2 500 0 1000000000 50 20 exampleVis
+Generating 500 random queries...finished
+99.03 lookups/sec   5.05 secs
+num results : 500
+Generating 500 random queries...finished
+101.15 lookups/sec   4.94 secs
+num results : 500
+
+
+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples/bulkIngest.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/bulkIngest.html b/1.3/user_manual/examples/bulkIngest.html new file mode 100644 index 0000000..4789bb7 --- /dev/null +++ b/1.3/user_manual/examples/bulkIngest.html @@ -0,0 +1,206 @@ + + + + + + + + + + + + +Bulk Ingest Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Bulk Ingest Example

+ +

This is an example of how to bulk ingest data into accumulo using map reduce.

+ +

The following commands show how to run this example. This example creates a +table called test_bulk which has two initial split points. Then 1000 rows of +test data are created in HDFS. After that the 1000 rows are ingested into +accumulo. Then we verify the 1000 rows are in accumulo. The +first two arguments to all of the commands except for GenerateTestData are the +accumulo instance name, and a comma-separated list of zookeepers.

+ +
$ ./bin/accumulo org.apache.accumulo.examples.mapreduce.bulk.SetupTable instance zookeepers username password test_bulk row_00000333 row_00000666
+$ ./bin/accumulo org.apache.accumulo.examples.mapreduce.bulk.GenerateTestData 0 1000 bulk/test_1.txt
+
+$ ./bin/tool.sh lib/accumulo-examples-*[^c].jar org.apache.accumulo.examples.mapreduce.bulk.BulkIngestExample instance zookeepers username password test_bulk bulk tmp/bulkWork
+$ ./bin/accumulo org.apache.accumulo.examples.mapreduce.bulk.VerifyIngest instance zookeepers username password test_bulk 0 1000
+
+
+ +

For a high level discussion of bulk ingest, see the docs dir.

+ +
+ + + + + +
+
+
+ + http://git-wip-us.apache.org/repos/asf/accumulo-website/blob/c0655661/1.3/user_manual/examples/constraints.html ---------------------------------------------------------------------- diff --git a/1.3/user_manual/examples/constraints.html b/1.3/user_manual/examples/constraints.html new file mode 100644 index 0000000..5c7d52a --- /dev/null +++ b/1.3/user_manual/examples/constraints.html @@ -0,0 +1,220 @@ + + + + + + + + + + + + +Constraints Example + + + + + + + + + + + + +
+
+
+ + +
+ +

Constraints Example

+ +

This an example of how to create a table with constraints. Below a table is +create with two example constraints. One constraints does not allow non alpha +numeric keys. The other constraint does not allow non numeric values. Two +inserts that violate these constraints are attempted and denied. The scan at +the end shows the inserts were not allowed.

+ +
$ ./bin/accumulo shell -u username -p pass
+
+Shell - Apache Accumulo Interactive Shell
+- 
+- version: 1.3.x-incubating
+- instance name: instance
+- instance id: 00000000-0000-0000-0000-000000000000
+- 
+- type 'help' for a list of available commands
+- 
+username@instance> createtable testConstraints
+username@instance testConstraints> config -t testConstraints -s table.constraint.1=org.apache.accumulo.examples.constraints.NumericValueConstraint
+username@instance testConstraints> config -t testConstraints -s table.constraint.2=org.apache.accumulo.examples.constraints.AlphaNumKeyConstrain                                                                                                    
+username@instance testConstraints> insert r1 cf1 cq1 1111
+username@instance testConstraints> insert r1 cf1 cq1 ABC
+  Constraint Failures:
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1)
+username@instance testConstraints> insert r1! cf1 cq1 ABC 
+  Constraint Failures:
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.constraints.NumericValueConstraint, violationCode:1, violationDescription:Value is not numeric, numberOfViolatingMutations:1)
+      ConstraintViolationSummary(constrainClass:org.apache.accumulo.examples.constraints.AlphaNumKeyConstraint, violationCode:1, violationDescription:Row was not alpha numeric, numberOfViolatingMutations:1)
+username@instance testConstraints> scan
+r1 cf1:cq1 []    1111
+username@instance testConstraints> 
+
+
+ +
+ + + + + +
+
+
+ +