Return-Path: X-Original-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-accumulo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id DA7A99D30 for ; Fri, 3 Feb 2012 13:54:49 +0000 (UTC) Received: (qmail 35152 invoked by uid 500); 3 Feb 2012 13:54:49 -0000 Delivered-To: apmail-incubator-accumulo-commits-archive@incubator.apache.org Received: (qmail 35124 invoked by uid 500); 3 Feb 2012 13:54:49 -0000 Mailing-List: contact accumulo-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: accumulo-dev@incubator.apache.org Delivered-To: mailing list accumulo-commits@incubator.apache.org Received: (qmail 35117 invoked by uid 99); 3 Feb 2012 13:54:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 13:54:49 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 13:54:48 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 8939523889BB for ; Fri, 3 Feb 2012 13:54:28 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r803863 - /websites/staging/accumulo/trunk/content/accumulo/example/wikisearch.html Date: Fri, 03 Feb 2012 13:54:28 -0000 To: accumulo-commits@incubator.apache.org From: buildbot@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20120203135428.8939523889BB@eris.apache.org> Author: buildbot Date: Fri Feb 3 13:54:28 2012 New Revision: 803863 Log: Staging update by buildbot for accumulo Modified: websites/staging/accumulo/trunk/content/accumulo/example/wikisearch.html Modified: websites/staging/accumulo/trunk/content/accumulo/example/wikisearch.html ============================================================================== --- websites/staging/accumulo/trunk/content/accumulo/example/wikisearch.html (original) +++ websites/staging/accumulo/trunk/content/accumulo/example/wikisearch.html Fri Feb 3 13:54:28 2012 @@ -103,7 +103,18 @@

In the example, Accumulo tracks the cardinality of all terms as elements are ingested. If the cardinality is small enough, it will track the set of documents by term directly. For example:

@@ -112,16 +123,16 @@ table td,th {padding-right: 10px;}
Value (count, document list)
Octopus -2 -[Document 57, Document 220] +2 +[Document 57, Document 220]
Other -172849 -[] +172849 +[]
Ostrich -1 -[Document 901] +1 +[Document 901]
@@ -167,6 +178,7 @@ table td,th {padding-right: 10px;} 2 Word, Octopus Document 220 +

Of course, there would be large numbers of documents in each partition, and the elements of those documents would be interlaced according to their sort order.

@@ -210,6 +222,7 @@ table td,th {padding-right: 10px;} Query Samples (seconds) Matches +Result Size “old” and “man” and “sea” 4.07 @@ -218,6 +231,7 @@ table td,th {padding-right: 10px;} 3.85 3.67 22956 +3830102 “paris” and “in” and “the” and “spring” 3.06 @@ -226,6 +240,7 @@ table td,th {padding-right: 10px;} 3.02 2.92 10755 +1757293 “rubber” and “duckie” and “ernie” 0.08 @@ -234,6 +249,7 @@ table td,th {padding-right: 10px;} 0.11 0.1 6 +808 “fast” and ( “furious” or “furriest”) 1.34 @@ -242,6 +258,7 @@ table td,th {padding-right: 10px;} 1.31 1.31 2973 +493800 “slashdot” and “grok” 0.06 @@ -250,6 +267,7 @@ table td,th {padding-right: 10px;} 0.06 0.06 14 +2371 “three” and “little” and “pigs” 0.92 @@ -258,9 +276,32 @@ table td,th {padding-right: 10px;} 1.08 0.88 2742 +481531

Because the terms are tested together within the region server, even fairly high-cardinality terms such as “old,” “man,” and “sea” can be tested efficiently, without needing to return to the client, or make distributed calls between servers to perform the intersection between terms.

+

For reference, here are the cardinalities for all the terms in the query (remember, this is across all languages loaded:

+ +
Term Cardinality +
ducky 795 +
ernie 13433 +
fast 166813 +
furious 10535 +
furriest 45 +
grok 1168 +
in 1884638 +
little 320748 +
man 548238 +
old 720795 +
paris 232464 +
pigs 8356 +
rubber 17235 +
sea 247231 +
slashdot 2343 +
spring 125605 +
three 718810 +
+

Accumulo supports caching index information, which is turned on by default, and for the non-index blocks of a file, which is not. After turning on data block caching for the wiki table: