Return-Path: X-Original-To: apmail-accumulo-commits-archive@www.apache.org Delivered-To: apmail-accumulo-commits-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F285410868 for ; Thu, 18 Jul 2013 22:47:41 +0000 (UTC) Received: (qmail 82034 invoked by uid 500); 18 Jul 2013 22:47:41 -0000 Delivered-To: apmail-accumulo-commits-archive@accumulo.apache.org Received: (qmail 82011 invoked by uid 500); 18 Jul 2013 22:47:41 -0000 Mailing-List: contact commits-help@accumulo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@accumulo.apache.org Delivered-To: mailing list commits@accumulo.apache.org Received: (qmail 82004 invoked by uid 500); 18 Jul 2013 22:47:41 -0000 Delivered-To: apmail-incubator-accumulo-commits@incubator.apache.org Received: (qmail 82001 invoked by uid 99); 18 Jul 2013 22:47:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 22:47:41 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jul 2013 22:47:30 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 033EA2388BFF for ; Thu, 18 Jul 2013 22:46:44 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r870045 [10/24] - in /websites/staging/accumulo/trunk/content: ./ 1.4/examples/ 1.4/user_manual/ 1.5/examples/ css/ downloads/ example/ governance/ user_manual_1.3-incubating/ user_manual_1.3-incubating/examples/ Date: Thu, 18 Jul 2013 22:46:39 -0000 To: accumulo-commits@incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20130718224644.033EA2388BFF@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: websites/staging/accumulo/trunk/content/1.5/examples/bloom.html ============================================================================== --- websites/staging/accumulo/trunk/content/1.5/examples/bloom.html (original) +++ websites/staging/accumulo/trunk/content/1.5/examples/bloom.html Thu Jul 18 22:46:37 2013 @@ -17,7 +17,7 @@ See the License for the specific language governing permissions and limitations under the License. --> - + Apache Accumulo Bloom Filter Example @@ -48,30 +48,31 @@

Project

Community

Development

Documentation

ASF links

    @@ -83,7 +84,7 @@
    - ™ +
    @@ -92,59 +93,59 @@ shows how bloom filters increase query performance when looking for values that do not exist in a table.

    Below table named bloom_test is created and bloom filters are enabled.

    -
    $ ./bin/accumulo shell -u username -p password
    +
    $ ./bin/accumulo shell -u username -p password
     Shell - Apache Accumulo Interactive Shell
    -- version: 1.5.0
    -- instance name: instance
    -- instance id: 00000000-0000-0000-0000-000000000000
    +- version: 1.5.0
    +- instance name: instance
    +- instance id: 00000000-0000-0000-0000-000000000000
     - 
     - type 'help' for a list of available commands
     - 
    -username@instance> setauths -u username -s exampleVis
    -username@instance> createtable bloom_test
    -username@instance bloom_test> config -t bloom_test -s table.bloom.enabled=true
    -username@instance bloom_test> exit
    +username@instance> setauths -u username -s exampleVis
    +username@instance> createtable bloom_test
    +username@instance bloom_test> config -t bloom_test -s table.bloom.enabled=true
    +username@instance bloom_test> exit
     

    Below 1 million random values are inserted into accumulo. The randomly generated rows range between 0 and 1 billion. The random number generator is initialized with the seed 7.

    -
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 -i instance -z zookeepers -u username -p password -t bl
     oom_test --num 1000000 -min 0 -max 1000000000 -valueSize 50 -batchMemory 2M -batchLatency 60s -batchThreads 3 --vis exampleVis
    +
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test --num 1000000 -min 0 -max 1000000000 -valueSize 50 -batchMemory 2M -batchLatency 60s -batchThreads 3 --vis exampleVis
     

    Below the table is flushed:

    -
    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test -w'
    -05 10:40:06,069 [shell.Shell] INFO : Flush of table bloom_test completed.
    +
    $ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test -w'
    +05 10:40:06,069 [shell.Shell] INFO : Flush of table bloom_test completed.
     

    After the flush completes, 500 random queries are done against the table. The same seed is used to generate the queries, therefore everything is found in the table.

    -
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t b
     loom_test --num 500 --min 0 --max 1000000000 --size 50 -batchThreads 20 --vis exampleVis
    -Generating 500 random queries...finished
    -96.19 lookups/sec   5.20 secs
    -num results : 500
    -Generating 500 random queries...finished
    -102.35 lookups/sec   4.89 secs
    -num results : 500
    +
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 --max 1000000000 --size 50 -batchThreads 20 --vis exampleVis
    +Generating 500 random queries...finished
    +96.19 lookups/sec   5.20 secs
    +num results : 500
    +Generating 500 random queries...finished
    +102.35 lookups/sec   4.89 secs
    +num results : 500
     

    Below another 500 queries are performed, using a different seed which results in nothing being found. In this case the lookups are much faster because of the bloom filters.

    -
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 8 -i instance -z zookeepers -u username -p password -t b
     loom_test --num 500 --min 0 --max 1000000000 --size 50 -batchThreads 20 -auths exampleVis
    -Generating 500 random queries...finished
    -2212.39 lookups/sec   0.23 secs
    -num results : 0
    -Did not find 500 rows
    -Generating 500 random queries...finished
    -4464.29 lookups/sec   0.11 secs
    -num results : 0
    -Did not find 500 rows
    +
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 8 -i instance -z zookeepers -u username -p password -t bloom_test --num 500 --min 0 --max 1000000000 --size 50 -batchThreads 20 -auths exampleVis
    +Generating 500 random queries...finished
    +2212.39 lookups/sec   0.23 secs
    +num results : 0
    +Did not find 500 rows
    +Generating 500 random queries...finished
    +4464.29 lookups/sec   0.11 secs
    +num results : 0
    +Did not find 500 rows
     
    @@ -173,133 +174,133 @@ Each map file will contain 1 million ent This is assuming that Accumulo is configured with enough memory to hold 1 million inserts. If not, then more map files will be created.

    The commands for creating the first table without bloom filters are below.

    -
    $ ./bin/accumulo shell -u username -p password
    +
    $ ./bin/accumulo shell -u username -p password
     Shell - Apache Accumulo Interactive Shell
    -- version: 1.5.0
    -- instance name: instance
    -- instance id: 00000000-0000-0000-0000-000000000000
    +- version: 1.5.0
    +- instance name: instance
    +- instance id: 00000000-0000-0000-0000-000000000000
     - 
     - type 'help' for a list of available commands
     - 
    -username@instance> setauths -u username -s exampleVis
    -username@instance> createtable bloom_test1
    -username@instance bloom_test1> config -t bloom_test1 -s table.compaction.major.ratio=7
    -username@instance bloom_test1> exit
    -
    -$ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test1 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --auths exampleVis"
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
    +username@instance> setauths -u username -s exampleVis
    +username@instance> createtable bloom_test1
    +username@instance bloom_test1> config -t bloom_test1 -s table.compaction.major.ratio=7
    +username@instance bloom_test1> exit
    +
    +$ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test1 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --auths exampleVis"
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test1 -w'
     

    The commands for creating the second table with bloom filers are below.

    -
    $ ./bin/accumulo shell -u username -p password
    +
    $ ./bin/accumulo shell -u username -p password
     Shell - Apache Accumulo Interactive Shell
    -- version: 1.5.0
    -- instance name: instance
    -- instance id: 00000000-0000-0000-0000-000000000000
    +- version: 1.5.0
    +- instance name: instance
    +- instance id: 00000000-0000-0000-0000-000000000000
     - 
     - type 'help' for a list of available commands
     - 
    -username@instance> setauths -u username -s exampleVis
    -username@instance> createtable bloom_test2
    -username@instance bloom_test2> config -t bloom_test2 -s table.compaction.major.ratio=7
    -username@instance bloom_test2> config -t bloom_test2 -s table.bloom.enabled=true
    -username@instance bloom_test2> exit
    -
    -$ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test2 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --auths exampleVis"
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
    -$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
    -$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
    +username@instance> setauths -u username -s exampleVis
    +username@instance> createtable bloom_test2
    +username@instance bloom_test2> config -t bloom_test2 -s table.compaction.major.ratio=7
    +username@instance bloom_test2> config -t bloom_test2 -s table.bloom.enabled=true
    +username@instance bloom_test2> exit
    +
    +$ ARGS="-i instance -z zookeepers -u username -p password -t bloom_test2 --num 1000000 --min 0 --max 1000000000 --size 50 --batchMemory 2M --batchLatency 60s --batchThreads 3 --auths exampleVis"
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 7 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 8 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
    +$ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchWriter --seed 9 $ARGS
    +$ ./bin/accumulo shell -u username -p password -e 'flush -t bloom_test2 -w'
     

    Below 500 lookups are done against the table without bloom filters using random NG seed 7. Even though only one map file will likely contain entries for this seed, all map files will be interrogated.

    -
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t b
     loom_test1 --num 500 --min 0 --max 1000000000 --size 50 --scanThreads 20 --auths exampleVis
    -Generating 500 random queries...finished
    -35.09 lookups/sec  14.25 secs
    -num results : 500
    -Generating 500 random queries...finished
    -35.33 lookups/sec  14.15 secs
    -num results : 500
    +
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test1 --num 500 --min 0 --max 1000000000 --size 50 --scanThreads 20 --auths exampleVis
    +Generating 500 random queries...finished
    +35.09 lookups/sec  14.25 secs
    +num results : 500
    +Generating 500 random queries...finished
    +35.33 lookups/sec  14.15 secs
    +num results : 500
     

    Below the same lookups are done against the table with bloom filters. The lookups were 2.86 times faster because only one map file was used, even though three map files existed.

    -
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t b
     loom_test2 --num 500 --min 0 --max 1000000000 --size 50 -scanThreads 20 --auths exampleVis
    -Generating 500 random queries...finished
    -99.03 lookups/sec   5.05 secs
    -num results : 500
    -Generating 500 random queries...finished
    -101.15 lookups/sec   4.94 secs
    -num results : 500
    +
    $ ./bin/accumulo org.apache.accumulo.examples.simple.client.RandomBatchScanner --seed 7 -i instance -z zookeepers -u username -p password -t bloom_test2 --num 500 --min 0 --max 1000000000 --size 50 -scanThreads 20 --auths exampleVis
    +Generating 500 random queries...finished
    +99.03 lookups/sec   5.05 secs
    +num results : 500
    +Generating 500 random queries...finished
    +101.15 lookups/sec   4.94 secs
    +num results : 500
     

    You can verify the table has three files by looking in HDFS. To look in HDFS you will need the table ID, because this is used in HDFS instead of the table name. The following command will show table ids.

    -
    $ ./bin/accumulo shell -u username -p password -e 'tables -l'
    -!METADATA       =>         !0
    -bloom_test1     =>         o7
    -bloom_test2     =>         o8
    -trace           =>          1
    +
    $ ./bin/accumulo shell -u username -p password -e 'tables -l'
    +!METADATA       =>         !0
    +bloom_test1     =>         o7
    +bloom_test2     =>         o8
    +trace           =>          1
     

    So the table id for bloom_test2 is o8. The command below shows what files this table has in HDFS. This assumes Accumulo is at the default location in HDFS.

    -
    $ hadoop fs -lsr /accumulo/tables/o8
    -drwxr-xr-x   - username supergroup          0 2012-01-10 14:02 /accumulo/tables/o8/default_tablet
    --rw-r--r--   3 username supergroup   52672650 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dj.rf
    --rw-r--r--   3 username supergroup   52436176 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dk.rf
    --rw-r--r--   3 username supergroup   52850173 2012-01-10 14:02 /accumulo/tables/o8/default_tablet/F00000dl.rf
    +
    $ hadoop fs -lsr /accumulo/tables/o8
    +drwxr-xr-x   - username supergroup          0 2012-01-10 14:02 /accumulo/tables/o8/default_tablet
    +-rw-r--r--   3 username supergroup   52672650 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dj.rf
    +-rw-r--r--   3 username supergroup   52436176 2012-01-10 14:01 /accumulo/tables/o8/default_tablet/F00000dk.rf
    +-rw-r--r--   3 username supergroup   52850173 2012-01-10 14:02 /accumulo/tables/o8/default_tablet/F00000dl.rf
     

    Running the rfile-info command shows that one of the files has a bloom filter and its 1.5MB.

    -
    $ ./bin/accumulo rfile-info /accumulo/tables/o8/default_tablet/F00000dj.rf
    -Locality group         : <DEFAULT>
    -Start block          : 0
    -Num   blocks         : 752
    -Index level 0        : 43,598 bytes  1 blocks
    -First key            : row_0000001169 foo:1 [exampleVis] 1326222052539 false
    -Last key             : row_0999999421 foo:1 [exampleVis] 1326222052058 false
    -Num entries          : 999,536
    +
    $ ./bin/accumulo rfile-info /accumulo/tables/o8/default_tablet/F00000dj.rf
    +Locality group         : <DEFAULT>
    +Start block          : 0
    +Num   blocks         : 752
    +Index level 0        : 43,598 bytes  1 blocks
    +First key            : row_0000001169 foo:1 [exampleVis] 1326222052539 false
    +Last key             : row_0999999421 foo:1 [exampleVis] 1326222052058 false
    +Num entries          : 999,536
     Column families      : [foo]
     
    -Meta block     : BCFile.index
    -  Raw size             : 4 bytes
    -  Compressed size      : 12 bytes
    +Meta block     : BCFile.index
    +  Raw size             : 4 bytes
    +  Compressed size      : 12 bytes
       Compression type     : gz
     
    -Meta block     : RFile.index
    -  Raw size             : 43,696 bytes
    -  Compressed size      : 15,592 bytes
    +Meta block     : RFile.index
    +  Raw size             : 43,696 bytes
    +  Compressed size      : 15,592 bytes
       Compression type     : gz
     
     Meta block     : acu_bloom
    -  Raw size             : 1,540,292 bytes
    -  Compressed size      : 1,433,115 bytes
    +  Raw size             : 1,540,292 bytes
    +  Compressed size      : 1,433,115 bytes
       Compression type     : gz