accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From e..@apache.org
Subject svn commit: r1431597 - in /accumulo/trunk/docs/examples: README.regex README.rowhash README.tabletofile README.terasort
Date Thu, 10 Jan 2013 20:11:15 GMT
Author: ecn
Date: Thu Jan 10 20:11:14 2013
New Revision: 1431597

URL: http://svn.apache.org/viewvc?rev=1431597&view=rev
Log:
ACCUMULO-279: add missing readmes

Added:
    accumulo/trunk/docs/examples/README.regex
    accumulo/trunk/docs/examples/README.rowhash
    accumulo/trunk/docs/examples/README.tabletofile
    accumulo/trunk/docs/examples/README.terasort

Added: accumulo/trunk/docs/examples/README.regex
URL: http://svn.apache.org/viewvc/accumulo/trunk/docs/examples/README.regex?rev=1431597&view=auto
==============================================================================
--- accumulo/trunk/docs/examples/README.regex (added)
+++ accumulo/trunk/docs/examples/README.regex Thu Jan 10 20:11:14 2013
@@ -0,0 +1,58 @@
+Title: Apache Accumulo Regex Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example uses mapreduce and accumulo to find items using regular expressions.
+This is accomplished using a map-only mapreduce job and a scan-time iterator.
+
+To run this example you will need some data in a table.  The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0-SNAPSHOT
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    - 
+    - type 'help' for a list of available commands
+    - 
+    username@instance> createtable input
+    username@instance> insert dogrow dogcf dogcq dogvalue
+    username@instance> insert catrow catcf catcq catvalue
+    username@instance> quit
+
+The RegexExample class sets an iterator on the scanner.  This does pattern matching
+against each key/value in accumulo, and only returns matching items.  It will do this
+in parallel and will store the results in files in hdfs.
+
+The following will search for any rows in the input table that starts with "dog":
+
+    $ bin/tool.sh lib/examples-simple*[^cs].jar org.apache.accumulo.examples.simple.mapreduce.RegexExample
-u user -p passwd -i instance -t input --rowRegex 'dog.*' --output /tmp/output
+
+    $ hadoop fs -ls /tmp/output
+    Found 3 items
+    -rw-r--r--   1 username supergroup          0 2013-01-10 14:11 /tmp/output/_SUCCESS
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:10 /tmp/output/_logs
+    -rw-r--r--   1 username supergroup         51 2013-01-10 14:10 /tmp/output/part-m-00000
+
+We can see the output of our little map-reduce job:
+
+    $ hadoop fs -text /tmp/output/output/part-m-00000
+    dogrow dogcf:dogcq [] 1357844987994 false	dogvalue
+    $
+
+

Added: accumulo/trunk/docs/examples/README.rowhash
URL: http://svn.apache.org/viewvc/accumulo/trunk/docs/examples/README.rowhash?rev=1431597&view=auto
==============================================================================
--- accumulo/trunk/docs/examples/README.rowhash (added)
+++ accumulo/trunk/docs/examples/README.rowhash Thu Jan 10 20:11:14 2013
@@ -0,0 +1,59 @@
+Title: Apache Accumulo RowHash Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example shows a simple map/reduce job that reads from an accumulo table and
+writes back into that table.
+
+To run this example you will need some data in a table.  The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0-SNAPSHOT
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    - 
+    - type 'help' for a list of available commands
+    - 
+    username@instance> createtable input
+    username@instance> insert a-row cf cq value
+    username@instance> insert b-row cf cq value
+    username@instance> quit
+
+The RowHash class will insert a hash for each row in the database if it contains a 
+specified colum.  Here's how you run the map/reduce job
+
+    $ bin/tool.sh lib/examples-simple*[^cs].jar org.apache.accumulo.examples.simple.mapreduce.RowHash
-u user -p passwd -i instance -t input --column cf:cq 
+
+Now we can scan the table and see the hashes:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0-SNAPSHOT
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    - 
+    - type 'help' for a list of available commands
+    - 
+    username@instance> scan -t input
+    a-row cf:cq []    value
+    a-row cf-HASHTYPE:cq-MD5BASE64 []    IGPBYI1uC6+AJJxC4r5YBA==
+    b-row cf:cq []    value
+    b-row cf-HASHTYPE:cq-MD5BASE64 []    IGPBYI1uC6+AJJxC4r5YBA==
+    username@instance> 
+

Added: accumulo/trunk/docs/examples/README.tabletofile
URL: http://svn.apache.org/viewvc/accumulo/trunk/docs/examples/README.tabletofile?rev=1431597&view=auto
==============================================================================
--- accumulo/trunk/docs/examples/README.tabletofile (added)
+++ accumulo/trunk/docs/examples/README.tabletofile Thu Jan 10 20:11:14 2013
@@ -0,0 +1,59 @@
+Title: Apache Accumulo Regex Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example uses mapreduce to extract specified columns from an existing table.
+
+To run this example you will need some data in a table.  The following will
+put a trivial amount of data into accumulo using the accumulo shell:
+
+    $ ./bin/accumulo shell -u username -p password
+    Shell - Apache Accumulo Interactive Shell
+    - version: 1.5.0-SNAPSHOT
+    - instance name: instance
+    - instance id: 00000000-0000-0000-0000-000000000000
+    - 
+    - type 'help' for a list of available commands
+    - 
+    username@instance> createtable input
+    username@instance> insert dog cf cq dogvalue
+    username@instance> insert cat cf cq catvalue
+    username@instance> insert junk family qualifier junkvalue
+    username@instance> quit
+
+The TableToFile class configures a map-only job to read the specified columns and
+write the key/value pairs to a file in HDFS.
+
+The following will extract the rows containing the column "cf:cq":
+
+    $ bin/tool.sh lib/examples-simple*[^cs].jar org.apache.accumulo.examples.simple.mapreduce.TableToFile
-u user -p passwd -i instance -t input --columns cf:cq --output /tmp/output
+
+    $ hadoop fs -ls /tmp/output
+    -rw-r--r--   1 username supergroup          0 2013-01-10 14:44 /tmp/output/_SUCCESS
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:44 /tmp/output/_logs
+    drwxr-xr-x   - username supergroup          0 2013-01-10 14:44 /tmp/output/_logs/history
+    -rw-r--r--   1 username supergroup       9049 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_1357847072863_username_TableToFile%5F1357847071434
+    -rw-r--r--   1 username supergroup      26172 2013-01-10 14:44 /tmp/output/_logs/history/job_201301081658_0011_conf.xml
+    -rw-r--r--   1 username supergroup         50 2013-01-10 14:44 /tmp/output/part-m-00000
+
+We can see the output of our little map-reduce job:
+
+    $ hadoop fs -text /tmp/output/output/part-m-00000
+    catrow cf:cq []	catvalue
+    dogrow cf:cq []	dogvalue
+    $
+

Added: accumulo/trunk/docs/examples/README.terasort
URL: http://svn.apache.org/viewvc/accumulo/trunk/docs/examples/README.terasort?rev=1431597&view=auto
==============================================================================
--- accumulo/trunk/docs/examples/README.terasort (added)
+++ accumulo/trunk/docs/examples/README.terasort Thu Jan 10 20:11:14 2013
@@ -0,0 +1,50 @@
+Title: Apache Accumulo MapReduce Example
+Notice:    Licensed to the Apache Software Foundation (ASF) under one
+           or more contributor license agreements.  See the NOTICE file
+           distributed with this work for additional information
+           regarding copyright ownership.  The ASF licenses this file
+           to you under the Apache License, Version 2.0 (the
+           "License"); you may not use this file except in compliance
+           with the License.  You may obtain a copy of the License at
+           .
+             http://www.apache.org/licenses/LICENSE-2.0
+           .
+           Unless required by applicable law or agreed to in writing,
+           software distributed under the License is distributed on an
+           "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+           KIND, either express or implied.  See the License for the
+           specific language governing permissions and limitations
+           under the License.
+
+This example uses map/reduce to generate random input data that will
+be sorted by storing it into accumulo.  It uses data very similar to the
+hadoop terasort benchmark.
+
+To run this example you run it with arguments describing the amount of data:
+
+    $ bin/tool.sh lib/examples-simple*[^cs].jar org.apache.accumulo.examples.simple.mapreduce.TeraSortIngest
\
+    -i instance -z zookeepers -u user -p password \
+    --count 10 \
+    --minKeySize 10 \ 
+    --maxKeySize 10 \
+    --minValueSize 78 \
+    --maxValueSize 78 \
+    --table sort \
+    --splits 10 \
+
+After the map reduce job completes, scan the data:
+
+    $ ./bin/accumulo shell -u username -p password
+    username@instance> scan -t sort 
+    +l-$$OE/ZH c:         4 []    GGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOO
+    ,C)wDw//u= c:        10 []    CCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKK
+    75@~?'WdUF c:         1 []    IIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQ
+    ;L+!2rT~hd c:         8 []    MMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUU
+    LsS8)|.ZLD c:         5 []    OOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWW
+    M^*dDE;6^< c:         9 []    UUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCC
+    ^Eu)<n#kdP c:         3 []    YYYYYYYYYYOOOOOOOOOOEEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGG
+    le5awB.$sm c:         6 []    WWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYYYYOOOOOOOOOOEEEEEEEE
+    q__[fwhKFg c:         7 []    EEEEEEEEEEUUUUUUUUUUKKKKKKKKKKAAAAAAAAAAQQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMM
+    w[o||:N&H, c:         2 []    QQQQQQQQQQGGGGGGGGGGWWWWWWWWWWMMMMMMMMMMCCCCCCCCCCSSSSSSSSSSIIIIIIIIIIYYYYYYYY
+
+Of course, a real benchmark would ingest millions of entries.



Mime
View raw message