incubator-accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From e..@apache.org
Subject svn commit: r1240172 - /incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext
Date Fri, 03 Feb 2012 13:54:21 GMT
Author: ecn
Date: Fri Feb  3 13:54:21 2012
New Revision: 1240172

URL: http://svn.apache.org/viewvc?rev=1240172&view=rev
Log:
add cell borders to tables, fix alignment, add term cardinalities and query result set sizes

Modified:
    incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext

Modified: incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext
URL: http://svn.apache.org/viewvc/incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext?rev=1240172&r1=1240171&r2=1240172&view=diff
==============================================================================
--- incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext (original)
+++ incubator/accumulo/site/trunk/content/accumulo/example/wikisearch.mdtext Fri Feb  3 13:54:21
2012
@@ -33,7 +33,18 @@ The example uses an indexing technique h
 In the example, Accumulo tracks the cardinality of all terms as elements are ingested.  If
the cardinality is small enough, it will track the set of documents by term directly.  For
example:
 
 <style type="text/css">
-table td,th {padding-right: 10px;}
+table, td, th {
+  padding-right: 5px;
+  padding-left: 5px;
+  border: 1px solid black;
+  border-collapse: collapse;
+}
+td {
+  text-align: right;
+}
+.lt {
+  text-align: left;
+}
 </style>
 
 <table>
@@ -42,16 +53,16 @@ table td,th {padding-right: 10px;}
 <th colspan="2">Value (count, document list)</th>
 </tr><tr>
 <td>Octopus
-<td align="right">2
-<td>[Document 57, Document 220]
+<td>2
+<td class='lt'>[Document 57, Document 220]
 </tr><tr>
 <td>Other
-<td align="right">172849
-<td>[]
+<td>172849
+<td class='lt'>[]
 </tr><tr>
 <td>Ostrich
-<td align="right">1
-<td>[Document 901]
+<td>1
+<td class='lt'>[Document 901]
 </tr>
 </table>
 
@@ -99,6 +110,7 @@ The example also creates a reverse word 
 <td>2
 <td>Word, Octopus
 <td>Document 220
+<td>
 </table>
 
 Of course, there would be large numbers of documents in each partition, and the elements
of those documents would be interlaced according to their sort order.
@@ -149,6 +161,7 @@ We performed the following queries, and 
 <th>Query
 <th colspan="5">Samples (seconds)
 <th>Matches
+<th>Result Size
 <tr>
 <td>“old” and “man” and “sea”
 <td>4.07
@@ -157,6 +170,7 @@ We performed the following queries, and 
 <td>3.85
 <td>3.67
 <td>22956
+<td>3830102
 <tr>
 <td>“paris” and “in” and “the” and “spring”
 <td>3.06
@@ -165,6 +179,7 @@ We performed the following queries, and 
 <td>3.02
 <td>2.92
 <td>10755
+<td>1757293
 <tr>
 <td>“rubber” and “duckie” and “ernie”
 <td>0.08
@@ -173,6 +188,7 @@ We performed the following queries, and 
 <td>0.11
 <td>0.1
 <td>6
+<td>808
 <tr>
 <td>“fast”  and ( “furious” or “furriest”) 
 <td>1.34
@@ -181,6 +197,7 @@ We performed the following queries, and 
 <td>1.31
 <td>1.31
 <td>2973
+<td>493800
 <tr>
 <td>“slashdot” and “grok”
 <td>0.06
@@ -189,6 +206,7 @@ We performed the following queries, and 
 <td>0.06
 <td>0.06
 <td>14
+<td>2371
 <tr>
 <td>“three” and “little” and “pigs”
 <td>0.92
@@ -197,10 +215,35 @@ We performed the following queries, and 
 <td>1.08
 <td>0.88
 <td>2742
+<td>481531
 </table>
 
 Because the terms are tested together within the region server, even fairly high-cardinality
terms such as “old,” “man,” and “sea” can be tested efficiently,
without needing to return to the client, or make distributed calls between servers to perform
the intersection between terms.
 
+For reference, here are the cardinalities for all the terms in the query (remember, this
is across all languages loaded:
+
+<table>
+<tr> <th>Term <th> Cardinality
+<tr> <td> ducky <td> 795
+<tr> <td> ernie <td> 13433
+<tr> <td> fast <td> 166813
+<tr> <td> furious <td> 10535
+<tr> <td> furriest <td> 45
+<tr> <td> grok <td> 1168
+<tr> <td> in <td> 1884638
+<tr> <td> little <td> 320748
+<tr> <td> man <td> 548238
+<tr> <td> old <td> 720795
+<tr> <td> paris <td> 232464
+<tr> <td> pigs <td> 8356
+<tr> <td> rubber <td> 17235
+<tr> <td> sea <td> 247231
+<tr> <td> slashdot <td> 2343
+<tr> <td> spring <td> 125605
+<tr> <td> three <td> 718810
+</table>
+
+
 Accumulo supports caching index information, which is turned on by default, and for the non-index
blocks of a file, which is not. After turning on data block caching for the wiki table:
 
 <table>



Mime
View raw message