Return-Path: X-Original-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5B0219D2F for ; Fri, 3 Feb 2012 13:54:46 +0000 (UTC) Received: (qmail 35108 invoked by uid 500); 3 Feb 2012 13:54:46 -0000 Delivered-To: apmail-incubator-accumulo-commits-archive@incubator.apache.org Received: (qmail 35069 invoked by uid 500); 3 Feb 2012 13:54:45 -0000 Mailing-List: contact accumulo-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-commits@incubator.apache.org Received: (qmail 35062 invoked by uid 99); 3 Feb 2012 13:54:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 13:54:45 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 13:54:42 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 900F9238899C; Fri, 3 Feb 2012 13:54:21 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1240172 - /incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext Date: Fri, 03 Feb 2012 13:54:21 -0000 To: accumulo-commits@incubator.apache.org From: ecn@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120203135421.900F9238899C@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: ecn Date: Fri Feb 3 13:54:21 2012 New Revision: 1240172 URL: http://svn.apache.org/viewvc?rev=1240172&view=rev Log: add cell borders to tables, fix alignment, add term cardinalities and query result set sizes Modified: incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext Modified: incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext?rev=1240172&r1=1240171&r2=1240172&view=diff ============================================================================== --- incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext (original) +++ incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext Fri Feb 3 13:54:21 2012 @@ -33,7 +33,18 @@ The example uses an indexing technique h In the example, Accumulo tracks the cardinality of all terms as elements are ingested. If the cardinality is small enough, it will track the set of documents by term directly. For example: @@ -42,16 +53,16 @@ table td,th {padding-right: 10px;}
Value (count, document list)
Octopus -2 -[Document 57, Document 220] +2 +[Document 57, Document 220]
Other -172849 -[] +172849 +[]
Ostrich -1 -[Document 901] +1 +[Document 901]
@@ -99,6 +110,7 @@ The example also creates a reverse word 2 Word, Octopus Document 220 + Of course, there would be large numbers of documents in each partition, and the elements of those documents would be interlaced according to their sort order. @@ -149,6 +161,7 @@ We performed the following queries, and Query Samples (seconds) Matches +Result Size “old” and “man” and “sea” 4.07 @@ -157,6 +170,7 @@ We performed the following queries, and 3.85 3.67 22956 +3830102 “paris” and “in” and “the” and “spring” 3.06 @@ -165,6 +179,7 @@ We performed the following queries, and 3.02 2.92 10755 +1757293 “rubber” and “duckie” and “ernie” 0.08 @@ -173,6 +188,7 @@ We performed the following queries, and 0.11 0.1 6 +808 “fast” and ( “furious” or “furriest”) 1.34 @@ -181,6 +197,7 @@ We performed the following queries, and 1.31 1.31 2973 +493800 “slashdot” and “grok” 0.06 @@ -189,6 +206,7 @@ We performed the following queries, and 0.06 0.06 14 +2371 “three” and “little” and “pigs” 0.92 @@ -197,10 +215,35 @@ We performed the following queries, and 1.08 0.88 2742 +481531 Because the terms are tested together within the region server, even fairly high-cardinality terms such as “old,” “man,” and “sea” can be tested efficiently, without needing to return to the client, or make distributed calls between servers to perform the intersection between terms. +For reference, here are the cardinalities for all the terms in the query (remember, this is across all languages loaded: + + +
Term Cardinality +
ducky 795 +
ernie 13433 +
fast 166813 +
furious 10535 +
furriest 45 +
grok 1168 +
in 1884638 +
little 320748 +
man 548238 +
old 720795 +
paris 232464 +
pigs 8356 +
rubber 17235 +
sea 247231 +
slashdot 2343 +
spring 125605 +
three 718810 +
+ + Accumulo supports caching index information, which is turned on by default, and for the non-index blocks of a file, which is not. After turning on data block caching for the wiki table: