accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From e..@apache.org
Subject svn commit: r803864 - in /websites/production/accumulo: ./ content/accumulo/example/wikisearch.html
Date Fri, 03 Feb 2012 13:54:56 GMT
Author: ecn
Date: Fri Feb  3 13:54:56 2012
New Revision: 803864

Log:
Publishing merge to accumulo site by ecn

Modified:
    websites/production/accumulo/   (props changed)
    websites/production/accumulo/content/accumulo/example/wikisearch.html

Propchange: websites/production/accumulo/
------------------------------------------------------------------------------
--- svn:mergeinfo (original)
+++ svn:mergeinfo Fri Feb  3 13:54:56 2012
@@ -1 +1 @@
-/websites/staging/accumulo/trunk:797863-803818
+/websites/staging/accumulo/trunk:797863-803863

Modified: websites/production/accumulo/content/accumulo/example/wikisearch.html
==============================================================================
--- websites/production/accumulo/content/accumulo/example/wikisearch.html (original)
+++ websites/production/accumulo/content/accumulo/example/wikisearch.html Fri Feb  3 13:54:56
2012
@@ -103,7 +103,18 @@
 </ol>
 <p>In the example, Accumulo tracks the cardinality of all terms as elements are ingested.
 If the cardinality is small enough, it will track the set of documents by term directly.
 For example:</p>
 <style type="text/css">
-table td,th {padding-right: 10px;}
+table, td, th {
+  padding-right: 5px;
+  padding-left: 5px;
+  border: 1px solid black;
+  border-collapse: collapse;
+}
+td {
+  text-align: right;
+}
+.lt {
+  text-align: left;
+}
 </style>
 
 <table>
@@ -112,16 +123,16 @@ table td,th {padding-right: 10px;}
 <th colspan="2">Value (count, document list)</th>
 </tr><tr>
 <td>Octopus
-<td align="right">2
-<td>[Document 57, Document 220]
+<td>2
+<td class='lt'>[Document 57, Document 220]
 </tr><tr>
 <td>Other
-<td align="right">172849
-<td>[]
+<td>172849
+<td class='lt'>[]
 </tr><tr>
 <td>Ostrich
-<td align="right">1
-<td>[Document 901]
+<td>1
+<td class='lt'>[Document 901]
 </tr>
 </table>
 
@@ -167,6 +178,7 @@ table td,th {padding-right: 10px;}
 <td>2
 <td>Word, Octopus
 <td>Document 220
+<td>
 </table>
 
 <p>Of course, there would be large numbers of documents in each partition, and the
elements of those documents would be interlaced according to their sort order.</p>
@@ -210,6 +222,7 @@ table td,th {padding-right: 10px;}
 <th>Query
 <th colspan="5">Samples (seconds)
 <th>Matches
+<th>Result Size
 <tr>
 <td>“old” and “man” and “sea”
 <td>4.07
@@ -218,6 +231,7 @@ table td,th {padding-right: 10px;}
 <td>3.85
 <td>3.67
 <td>22956
+<td>3830102
 <tr>
 <td>“paris” and “in” and “the” and “spring”
 <td>3.06
@@ -226,6 +240,7 @@ table td,th {padding-right: 10px;}
 <td>3.02
 <td>2.92
 <td>10755
+<td>1757293
 <tr>
 <td>“rubber” and “duckie” and “ernie”
 <td>0.08
@@ -234,6 +249,7 @@ table td,th {padding-right: 10px;}
 <td>0.11
 <td>0.1
 <td>6
+<td>808
 <tr>
 <td>“fast”  and ( “furious” or “furriest”) 
 <td>1.34
@@ -242,6 +258,7 @@ table td,th {padding-right: 10px;}
 <td>1.31
 <td>1.31
 <td>2973
+<td>493800
 <tr>
 <td>“slashdot” and “grok”
 <td>0.06
@@ -250,6 +267,7 @@ table td,th {padding-right: 10px;}
 <td>0.06
 <td>0.06
 <td>14
+<td>2371
 <tr>
 <td>“three” and “little” and “pigs”
 <td>0.92
@@ -258,9 +276,32 @@ table td,th {padding-right: 10px;}
 <td>1.08
 <td>0.88
 <td>2742
+<td>481531
 </table>
 
 <p>Because the terms are tested together within the region server, even fairly high-cardinality
terms such as “old,” “man,” and “sea” can be tested efficiently,
without needing to return to the client, or make distributed calls between servers to perform
the intersection between terms.</p>
+<p>For reference, here are the cardinalities for all the terms in the query (remember,
this is across all languages loaded:</p>
+<table>
+<tr> <th>Term <th> Cardinality
+<tr> <td> ducky <td> 795
+<tr> <td> ernie <td> 13433
+<tr> <td> fast <td> 166813
+<tr> <td> furious <td> 10535
+<tr> <td> furriest <td> 45
+<tr> <td> grok <td> 1168
+<tr> <td> in <td> 1884638
+<tr> <td> little <td> 320748
+<tr> <td> man <td> 548238
+<tr> <td> old <td> 720795
+<tr> <td> paris <td> 232464
+<tr> <td> pigs <td> 8356
+<tr> <td> rubber <td> 17235
+<tr> <td> sea <td> 247231
+<tr> <td> slashdot <td> 2343
+<tr> <td> spring <td> 125605
+<tr> <td> three <td> 718810
+</table>
+
 <p>Accumulo supports caching index information, which is turned on by default, and
for the non-index blocks of a file, which is not. After turning on data block caching for
the wiki table:</p>
 <table>
 <tr>



Mime
View raw message