incubator-accumulo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From build...@apache.org
Subject svn commit: r798727 - in /websites/staging/accumulo/trunk/content/accumulo: user_manual_1.3-incubating/ user_manual_1.4-incubating/
Date Tue, 15 Nov 2011 20:53:17 GMT
Author: buildbot
Date: Tue Nov 15 20:53:16 2011
New Revision: 798727

Log:
Staging update by buildbot

Added:
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img6.png 
 (with props)
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img7.png 
 (with props)
Modified:
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Contents.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Contents.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.html
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img2.png
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img3.png
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img4.png
    websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img5.png

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Contents.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Contents.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Contents.html
Tue Nov 15 20:53:16 2011
@@ -165,19 +165,14 @@
 <ul>
 <li><a href="Table_Configuration.html#Setting_Iterators_via_the_Shell">Setting
Iterators via the Shell</a></li>
 <li><a href="Table_Configuration.html#Setting_Iterators_Programmatically">Setting
Iterators Programmatically</a></li>
+<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></li>
+<li><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></li>
 </ul>
 </li>
 <li>
-<p><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></p>
-<ul>
-<li><a href="Table_Configuration.html#Logical_Time">Logical Time</a></li>
-<li><a href="Table_Configuration.html#Deletes">Deletes</a></li>
-</ul>
-</li>
-<li>
-<p><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></p>
+<p><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></p>
 </li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
 </ul>
 </li>
 <li>

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Shell_Commands.html
Tue Nov 15 20:53:16 2011
@@ -479,7 +479,7 @@
 </p>
 <div class="codehilite"><pre><span class="err">usage:</span> <span
class="err">listscans</span> <span class="err">[-?]</span> <span class="err">[-np]</span>
<span class="err">[-ts</span> <span class="err">&lt;tablet</span>
<span class="err">server&gt;]</span>   
 <span class="err">description:</span> <span class="err">list</span>
<span class="err">what</span> <span class="err">scans</span> <span
class="err">are</span> <span class="err">currently</span> <span class="err">running</span>
<span class="err">in</span> <span class="err">accumulo.</span> <span
class="err">See</span> <span class="err">the</span>   
-       <span class="err">accumulo.core.client.admin.ActiveScan</span> <span
class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span>
<span class="err">information</span>   
+       <span class="err">org.apache.accumulo.core.client.admin.ActiveScan</span>
<span class="err">javadoc</span> <span class="err">for</span> <span
class="err">more</span> <span class="err">information</span>   
        <span class="err">about</span> <span class="err">columns.</span>
  
   <span class="err">-?,-help</span>  <span class="err">display</span>
<span class="err">this</span> <span class="err">help</span>   
   <span class="err">-np,-no-pagination</span>  <span class="err">disables</span>
<span class="err">pagination</span> <span class="err">of</span> <span
class="err">output</span>   

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.3-incubating/Table_Configuration.html
Tue Nov 15 20:53:16 2011
@@ -100,9 +100,8 @@
 <li><a href="Table_Configuration.html#Constraints">Constraints</a></li>
 <li><a href="Table_Configuration.html#Bloom_Filters">Bloom Filters</a></li>
 <li><a href="Table_Configuration.html#Iterators">Iterators</a></li>
-<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></li>
-<li><a href="Table_Configuration.html#Filtering_Iterators">Filtering Iterators</a></li>
 <li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
 </ul>
 <hr />
 <h2 id="a_idtable_configurationa_table_configuration"><a id=Table_Configuration></a>
Table Configuration</h2>
@@ -208,7 +207,7 @@ accumulo/docs/examples/README.bloom . </
 
 
 <p>Tables support separate Iterator settings to be applied at scan time, upon minor
compaction and upon major compaction. For most uses, tables will have identical iterator settings
for all three to avoid inconsistent results. </p>
-<h2 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a
id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h2>
+<h3 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a
id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h3>
 <p>Accumulo provides the capability to manage versioned data through the use of timestamps
within the Key. If a timestamp is not specified in the key created by the client then the
system will set the timestamp to the current time. Two keys with identical rowIDs and columns
but different timestamps are considered two versions of the same key. If two inserts are made
into accumulo with the same rowID, column, and timestamp, then the behavior is non-deterministic.
</p>
 <p>Timestamps are sorted in descending order, so the most recent data comes first.
Accumulo can be configured to return the top k versions, or versions later than a given date.
The default is to return the one most recent version. </p>
 <p>The version policy can be changed by changing the VersioningIterator options for
a table as follows: </p>
@@ -223,16 +222,16 @@ accumulo/docs/examples/README.bloom . </
 </pre></div>
 
 
-<h3 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical
Time</h3>
+<h4 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical
Time</h4>
 <p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
set by accumulo always move forward. This helps avoid problems caused by TabletServers that
have different time settings. The per tablet counter gives unique one up time stamps on a
per mutation basis. When using time in milliseconds, if two things arrive within the same
millisecond then both receive the same timestamp. </p>
 <p>A table can be configured to use logical timestamps at creation time as follows:
</p>
 <div class="codehilite"><pre><span class="n">user</span><span
class="nv">@myinstance</span><span class="o">&gt;</span> <span
class="n">createtable</span> <span class="o">-</span><span class="n">tl</span>
<span class="n">logical</span>
 </pre></div>
 
 
-<h3 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h3>
+<h4 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h4>
 <p>Deletes are special keys in accumulo that get sorted along will all the other data.
When a delete key is inserted, accumulo will not show anything that has a timestamp less than
or equal to the delete key. During major compaction, any keys older than a delete key are
omitted from the new file created, and the omitted keys are removed from disk as part of the
regular garbage collection process. </p>
-<h2 id="a_idfiltering_iteratorsa_filtering_iterators"><a id=Filtering_Iterators></a>
Filtering Iterators</h2>
+<h3 id="a_idfiltering_iteratorsa_filtering_iterators"><a id=Filtering_Iterators></a>
Filtering Iterators</h3>
 <p>When scanning over a set of key-value pairs it is possible to apply an arbitrary
filtering policy through the use of a FilteringIterator. These types of iterators return only
key-value pairs that satisfy the filter logic. Accumulo has two built-in filtering iterators
that can be configured on any table: AgeOff and RegEx. More can be added by writing a Java
class that implements the <br />
 org.apache.accumulo.core.iterators.filter.Filter interface. </p>
 <p>To configure the AgeOff filter to remove data older than a certain date or a fixed
amount of time from the present. The following example sets a table to delete everything inserted
over 30 seconds ago: </p>
@@ -338,7 +337,22 @@ org.apache.accumulo.core.iterators.filte
 <p>Additional Aggregators can be added by creating a Java class that implements <br
/>
 <strong>org.apache.accumulo.core.iterators.aggregation.Aggregator</strong> and
adding a jar containing that class to Accumulo's lib directory. </p>
 <p>An example of an aggregator can be found under <br />
-accumulo/src/examples/main/java/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+accumulo/src/examples/main/java/org/apache/accumulo/examples/aggregation/SortedSetAggregator.java
</p>
+<h2 id="a_idblock_cachea_block_cache"><a id=Block_Cache></a> Block Cache</h2>
+<p>In order to increase throughput of commonly accessed entries, Accumulo employs a
block cache. This block cache buffers data in memory so that it doesn't have to be read off
of disk. The RFile format that Accumulo prefers is a mix of index blocks and data blocks,
where the index blocks are used to find the appropriate data blocks. Typical queries to Accumulo
result in a binary search over several index blocks followed by a linear scan of one or more
data blocks. </p>
+<p>The block cache can be configured on a per-table basis, and all tablets hosted on
a tablet server share a single resource pool. To configure the size of the tablet server's
block cache, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span
class="o">.</span><span class="n">cache</span><span class="o">.</span><span
class="n">data</span><span class="o">.</span><span class="n">size:</span>
<span class="n">Specifies</span> <span class="n">the</span> <span
class="n">size</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">cache</span> <span class="k">for</span> <span class="n">file</span>
<span class="n">data</span> <span class="n">blocks</span><span
class="o">.</span>
+<span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span
class="o">.</span><span class="nb">index</span><span class="o">.</span><span
class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span>
<span class="n">size</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">cache</span> <span class="k">for</span> <span class="n">file</span>
<span class="n">indices</span><span class="o">.</span>
+</pre></div>
+
+
+<p>To enable the block cache for your table, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">table</span><span
class="o">.</span><span class="n">cache</span><span class="o">.</span><span
class="n">block</span><span class="o">.</span><span class="n">enable:</span>
<span class="n">Determines</span> <span class="n">whether</span> <span
class="n">file</span> <span class="p">(</span><span class="n">data</span><span
class="p">)</span> <span class="n">block</span> <span class="n">cache</span>
<span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+<span class="n">table</span><span class="o">.</span><span class="n">cache</span><span
class="o">.</span><span class="nb">index</span><span class="o">.</span><span
class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span>
<span class="nb">index</span> <span class="n">cache</span> <span
class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+</pre></div>
+
+
+<p>The block cache can have a significant effect on alleviating hot spots, as well
as reducing query latency. It is enabled by default for the !METADATA table. </p>
 <hr />
 <p><strong> Next:</strong> <a href="Table_Design.html">Table Design</a>
<strong> Up:</strong> <a href="accumulo_user_manual.html">Accumulo User
Manual Version 1.3</a> <strong> Previous:</strong> <a href="Writing_Accumulo_Clients.html">Writing
Accumulo Clients</a>   <strong> <a href="Contents.html">Contents</a></strong></p>
   </div>

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Contents.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Contents.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Contents.html
Tue Nov 15 20:53:16 2011
@@ -166,19 +166,15 @@
 <ul>
 <li><a href="Table_Configuration.html#Setting_Iterators_via_the_Shell">Setting
Iterators via the Shell</a></li>
 <li><a href="Table_Configuration.html#Setting_Iterators_Programmatically">Setting
Iterators Programmatically</a></li>
+<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></li>
+<li><a href="Table_Configuration.html#Filters">Filters</a></li>
+<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
 </ul>
 </li>
 <li>
-<p><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></p>
-<ul>
-<li><a href="Table_Configuration.html#Logical_Time">Logical Time</a></li>
-<li><a href="Table_Configuration.html#Deletes">Deletes</a></li>
-</ul>
-</li>
-<li>
-<p><a href="Table_Configuration.html#Filters">Filters</a></p>
+<p><a href="Table_Configuration.html#Block_Cache">Block Cache</a></p>
 </li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Compaction">Compaction</a></li>
 <li><a href="Table_Configuration.html#Pre-splitting_tables">Pre-splitting tables</a></li>
 <li><a href="Table_Configuration.html#Merging_tablets">Merging tablets</a></li>
 <li><a href="Table_Configuration.html#Delete_Range">Delete Range</a></li>

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Shell_Commands.html
Tue Nov 15 20:53:16 2011
@@ -545,7 +545,7 @@
 </p>
 <div class="codehilite"><pre><span class="err">usage:</span> <span
class="err">listscans</span> <span class="err">[-?]</span> <span class="err">[-np]</span>
<span class="err">[-ts</span> <span class="err">&lt;tablet</span>
<span class="err">server&gt;]</span>   
 <span class="err">description:</span> <span class="err">list</span>
<span class="err">what</span> <span class="err">scans</span> <span
class="err">are</span> <span class="err">currently</span> <span class="err">running</span>
<span class="err">in</span> <span class="err">accumulo.</span> <span
class="err">See</span> <span class="err">the</span>   
-       <span class="err">accumulo.core.client.admin.ActiveScan</span> <span
class="err">javadoc</span> <span class="err">for</span> <span class="err">more</span>
<span class="err">information</span>   
+       <span class="err">org.apache.accumulo.core.client.admin.ActiveScan</span>
<span class="err">javadoc</span> <span class="err">for</span> <span
class="err">more</span> <span class="err">information</span>   
        <span class="err">about</span> <span class="err">columns.</span>
  
   <span class="err">-?,-help</span>  <span class="err">display</span>
<span class="err">this</span> <span class="err">help</span>   
   <span class="err">-np,-no-pagination</span>  <span class="err">disables</span>
<span class="err">pagination</span> <span class="err">of</span> <span
class="err">output</span>   

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Configuration.html
Tue Nov 15 20:53:16 2011
@@ -100,9 +100,8 @@
 <li><a href="Table_Configuration.html#Constraints">Constraints</a></li>
 <li><a href="Table_Configuration.html#Bloom_Filters">Bloom Filters</a></li>
 <li><a href="Table_Configuration.html#Iterators">Iterators</a></li>
-<li><a href="Table_Configuration.html#Versioning_Iterators_and_Timestamps">Versioning
Iterators and Timestamps</a></li>
-<li><a href="Table_Configuration.html#Filters">Filters</a></li>
-<li><a href="Table_Configuration.html#Aggregating_Iterators">Aggregating Iterators</a></li>
+<li><a href="Table_Configuration.html#Block_Cache">Block Cache</a></li>
+<li><a href="Table_Configuration.html#Compaction">Compaction</a></li>
 <li><a href="Table_Configuration.html#Pre-splitting_tables">Pre-splitting tables</a></li>
 <li><a href="Table_Configuration.html#Merging_tablets">Merging tablets</a></li>
 <li><a href="Table_Configuration.html#Delete_Range">Delete Range</a></li>
@@ -110,7 +109,7 @@
 </ul>
 <hr />
 <h2 id="a_idtable_configurationa_table_configuration"><a id=Table_Configuration></a>
Table Configuration</h2>
-<p>Accumulo tables have a few options that can be configured to alter the default behavior
of Accumulo as well as improve performance based on the data stored. These include locality
groups, constraints, and iterators. </p>
+<p>Accumulo tables have a few options that can be configured to alter the default behavior
of Accumulo as well as improve performance based on the data stored. These include locality
groups, constraints, bloom filters, iterators, and block cache. </p>
 <h2 id="a_idlocality_groupsa_locality_groups"><a id=Locality_Groups></a>
Locality Groups</h2>
 <p>Accumulo supports storing of sets of column families separately on disk to allow
clients to scan over columns that are frequently used together efficient and to avoid scanning
over column families that are not requested. After a locality group is set Scanner and BatchScanner
operations will automatically take advantage of them whenever the fetchColumnFamilies() method
is used. </p>
 <p>By default tables place all column families into the same ``default" locality group.
Additional locality groups can be configured anytime via the shell or programmatically as
follows: </p>
@@ -212,7 +211,7 @@ accumulo/docs/examples/README.bloom . </
 
 
 <p>Tables support separate Iterator settings to be applied at scan time, upon minor
compaction and upon major compaction. For most uses, tables will have identical iterator settings
for all three to avoid inconsistent results. </p>
-<h2 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a
id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h2>
+<h3 id="a_idversioning_iterators_and_timestampsa_versioning_iterators_and_timestamps"><a
id=Versioning_Iterators_and_Timestamps></a> Versioning Iterators and Timestamps</h3>
 <p>Accumulo provides the capability to manage versioned data through the use of timestamps
within the Key. If a timestamp is not specified in the key created by the client then the
system will set the timestamp to the current time. Two keys with identical rowIDs and columns
but different timestamps are considered two versions of the same key. If two inserts are made
into accumulo with the same rowID, column, and timestamp, then the behavior is non-deterministic.
</p>
 <p>Timestamps are sorted in descending order, so the most recent data comes first.
Accumulo can be configured to return the top k versions, or versions later than a given date.
The default is to return the one most recent version. </p>
 <p>The version policy can be changed by changing the VersioningIterator options for
a table as follows: </p>
@@ -227,16 +226,16 @@ accumulo/docs/examples/README.bloom . </
 </pre></div>
 
 
-<h3 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical
Time</h3>
+<h4 id="a_idlogical_timea_logical_time"><a id=Logical_Time></a> Logical
Time</h4>
 <p>Accumulo 1.2 introduces the concept of logical time. This ensures that timestamps
set by accumulo always move forward. This helps avoid problems caused by TabletServers that
have different time settings. The per tablet counter gives unique one up time stamps on a
per mutation basis. When using time in milliseconds, if two things arrive within the same
millisecond then both receive the same timestamp. When using time in milliseconds, accumulo
set times will still always move forward and never backwards. </p>
 <p>A table can be configured to use logical timestamps at creation time as follows:
</p>
 <div class="codehilite"><pre><span class="n">user</span><span
class="nv">@myinstance</span><span class="o">&gt;</span> <span
class="n">createtable</span> <span class="o">-</span><span class="n">tl</span>
<span class="n">logical</span>
 </pre></div>
 
 
-<h3 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h3>
+<h4 id="a_iddeletesa_deletes"><a id=Deletes></a> Deletes</h4>
 <p>Deletes are special keys in accumulo that get sorted along will all the other data.
When a delete key is inserted, accumulo will not show anything that has a timestamp less than
or equal to the delete key. During major compaction, any keys older than a delete key are
omitted from the new file created, and the omitted keys are removed from disk as part of the
regular garbage collection process. </p>
-<h2 id="a_idfiltersa_filters"><a id=Filters></a> Filters</h2>
+<h3 id="a_idfiltersa_filters"><a id=Filters></a> Filters</h3>
 <p>When scanning over a set of key-value pairs it is possible to apply an arbitrary
filtering policy through the use of a Filter. Filters are types of iterators that return only
key-value pairs that satisfy the filter logic. Accumulo has a few built-in filters that can
be configured on any table: AgeOff, ColumnAgeOff, Timestamp, NoVis, and RegEx. More can be
added by writing a Java class that extends the <br />
 org.apache.accumulo.core.iterators.Filter class. </p>
 <p>The AgeOff filter can be configured to remove data older than a certain date or
a fixed amount of time from the present. The following example sets a table to delete everything
inserted over 30 seconds ago: </p>
@@ -278,7 +277,7 @@ org.apache.accumulo.core.iterators.Filte
 </pre></div>
 
 
-<h2 id="a_idaggregating_iteratorsa_aggregating_iterators"><a id=Aggregating_Iterators></a>
Aggregating Iterators</h2>
+<h3 id="a_idaggregating_iteratorsa_aggregating_iterators"><a id=Aggregating_Iterators></a>
Aggregating Iterators</h3>
 <p>Accumulo allows aggregating iterators to be configured on tables and column families.
When an aggregating iterator is set, the iterator is applied across the values associated
with any keys that share rowID, column family, and column qualifier. This is similar to the
reduce step in MapReduce, which applied some function to all the values associated with a
particular key. </p>
 <p>For example, if an aggregating iterator were configured on a table and the following
mutations were inserted: </p>
 <div class="codehilite"><pre><span class="n">Row</span>     <span
class="n">Family</span> <span class="n">Qualifier</span> <span class="n">Timestamp</span>
 <span class="n">Value</span>
@@ -319,7 +318,49 @@ org.apache.accumulo.core.iterators.Filte
 <p>Additional Aggregators can be added by creating a Java class that implements <br
/>
 <strong>org.apache.accumulo.core.iterators.aggregation.Aggregator</strong> and
adding a jar containing that class to Accumulo's lib directory. </p>
 <p>An example of an aggregator can be found under <br />
-accumulo/src/examples/main/java/accumulo/examples/aggregation/SortedSetAggregator.java </p>
+accumulo/src/examples/main/java/org/apache/accumulo/examples/aggregation/SortedSetAggregator.java
</p>
+<h2 id="a_idblock_cachea_block_cache"><a id=Block_Cache></a> Block Cache</h2>
+<p>In order to increase throughput of commonly accessed entries, Accumulo employs a
block cache. This block cache buffers data in memory so that it doesn't have to be read off
of disk. The RFile format that Accumulo prefers is a mix of index blocks and data blocks,
where the index blocks are used to find the appropriate data blocks. Typical queries to Accumulo
result in a binary search over several index blocks followed by a linear scan of one or more
data blocks. </p>
+<p>The block cache can be configured on a per-table basis, and all tablets hosted on
a tablet server share a single resource pool. To configure the size of the tablet server's
block cache, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span
class="o">.</span><span class="n">cache</span><span class="o">.</span><span
class="n">data</span><span class="o">.</span><span class="n">size:</span>
<span class="n">Specifies</span> <span class="n">the</span> <span
class="n">size</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">cache</span> <span class="k">for</span> <span class="n">file</span>
<span class="n">data</span> <span class="n">blocks</span><span
class="o">.</span>
+<span class="n">tserver</span><span class="o">.</span><span class="n">cache</span><span
class="o">.</span><span class="nb">index</span><span class="o">.</span><span
class="n">size:</span> <span class="n">Specifies</span> <span class="n">the</span>
<span class="n">size</span> <span class="n">of</span> <span class="n">the</span>
<span class="n">cache</span> <span class="k">for</span> <span class="n">file</span>
<span class="n">indices</span><span class="o">.</span>
+</pre></div>
+
+
+<p>To enable the block cache for your table, set the following properties: </p>
+<div class="codehilite"><pre><span class="n">table</span><span
class="o">.</span><span class="n">cache</span><span class="o">.</span><span
class="n">block</span><span class="o">.</span><span class="n">enable:</span>
<span class="n">Determines</span> <span class="n">whether</span> <span
class="n">file</span> <span class="p">(</span><span class="n">data</span><span
class="p">)</span> <span class="n">block</span> <span class="n">cache</span>
<span class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+<span class="n">table</span><span class="o">.</span><span class="n">cache</span><span
class="o">.</span><span class="nb">index</span><span class="o">.</span><span
class="n">enable:</span> <span class="n">Determines</span> <span class="n">whether</span>
<span class="nb">index</span> <span class="n">cache</span> <span
class="n">is</span> <span class="n">enabled</span><span class="o">.</span>
+</pre></div>
+
+
+<p>The block cache can have a significant effect on alleviating hot spots, as well
as reducing query latency. It is enabled by default for the !METADATA table. </p>
+<h2 id="a_idcompactiona_compaction"><a id=Compaction></a> Compaction</h2>
+<p>As data is written to Accumulo it is buffered in memory. The data buffered in memory
is eventually written to HDFS on a per tablet basis. Files can also be added to tablets directly
by bulk import. In the background tablet servers run major compactions to merge multiple files
into one. The tablet server has to decide which tablets to compact and which files within
a tablet to compact. This decision is made using the compaction ratio, which is configurable
on a per table basis. To configure this ratio modify the following property: </p>
+<div class="codehilite"><pre><span class="n">table</span><span
class="o">.</span><span class="n">compaction</span><span class="o">.</span><span
class="n">major</span><span class="o">.</span><span class="n">ratio</span>
+</pre></div>
+
+
+<p>Increasing this ratio will result in more files per tablet and less compaction work.
More files per tablet means more higher query latency. So adjusting this ratio is a trade
off between ingest and query performance. The ratio defaults to 3. </p>
+<p>The way the ratio works is that a set of files is compacted into one file if the
sum of the sizes of the files in the set is larger than the ratio multiplied by the size of
the largest file in the set. If this is not true for the set of all files in a tablet, the
largest file is removed from consideration, and the remaining files are considered for compaction.
This is repeated until a compaction is triggered or there are no files left to consider. </p>
+<p>The number of background threads tablet servers use to run major compactions is
configurable. To configure this modify the following property: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span
class="o">.</span><span class="n">compaction</span><span class="o">.</span><span
class="n">major</span><span class="o">.</span><span class="n">concurrent</span><span
class="o">.</span><span class="n">max</span>
+</pre></div>
+
+
+<p>Also, the number of threads tablet servers use for minor compactions is configurable.
To configure this modify the following property: </p>
+<div class="codehilite"><pre><span class="n">tserver</span><span
class="o">.</span><span class="n">compaction</span><span class="o">.</span><span
class="n">minor</span><span class="o">.</span><span class="n">concurrent</span><span
class="o">.</span><span class="n">max</span>
+</pre></div>
+
+
+<p>The numbers of minor and major compactions running and queued is visible on the
Accumulo monitor page. This allows you to see if compactions are backing up and adjustments
to the above settings are needed. When adjusting the number of threads available for compactions,
consider the number of cores and other tasks running on the nodes such as maps and reduces.
</p>
+<p>If major compactions are not keeping up, then the number of files per tablet will
grow to a point such that query performance starts to suffer. One way to handle this situation
is to increase the compaction ratio. For example, if the compaction ratio were set to 1, then
every new file added to a tablet by minor compaction would immediately queue the tablet for
major compaction. So if a tablet has a 200M file and minor compaction writes a 1M file, then
the major compaction will attempt to merge the 200M and 1M file. If the tablet server has
lots of tablets trying to do this sort of thing, then major compactions will back up and the
number of files per tablet will start to grow, assuming data is being continuously written.
Increasing the compaction ratio will alleviate backups by lowering the amount of major compaction
work that needs to be done. </p>
+<p>Another option to deal with the files per tablet growing too large is to adjust
the following property: </p>
+<div class="codehilite"><pre><span class="n">table</span><span
class="o">.</span><span class="n">file</span><span class="o">.</span><span
class="n">max</span>
+</pre></div>
+
+
+<p>When a tablet reaches this number of files and needs to flush its in-memory data
to disk, it will choose to do a merging minor compaction. A merging minor compaction will
merge the tablet's smallest file with the data in memory at minor compaction time. Therefore
the number of files will not grow beyond this limit. This will make minor compactions take
longer, which will cause ingest performance to decrease. This can cause ingest to slow down
until major compactions have enough time to catch up. When adjusting this property, also consider
adjusting the compaction ratio. Ideally, merging minor compactions never need to occur and
major compactions will keep up. It is possible to configure the file max and compaction ratio
such that only merging minor compactions occur and major compactions never occur. This should
be avoided because doing only merging minor compactions causes <img alt="$O(N^2)$" src="img2.png"
/> work to be done. The amount of work done by major compactions
  is  <img alt="$O(N*klzzwxh:0051og_R(N))$" src="img3.png" /> where <em>R</em>
is the compaction ratio. </p>
+<p>Compactions can be initiated manually for a table. To initiate a minor compaction,
use the flush command in the shell. To initiate a major compaction, use the compact command
in the shell. The compact command will compact all tablets in a table to one file. Even tablets
with one file are compacted. This is useful for the case where a major compaction filter is
configured for a table. In 1.4 the ability to compact a range of a table was added. To use
this feature specify start and stop rows for the compact command. This will only compact tablets
that overlap the given row range. </p>
 <h2 id="a_idpre-splitting_tablesa_pre-splitting_tables"><a id=Pre-splitting_tables></a>
Pre-splitting tables</h2>
 <p>Accumulo will balance and distribute tables accross servers. Before a table gets
large, it will be maintained as a single tablet on a single server. This limits the speed
at which data can be added or queried to the speed of a single node. To improve performance
when the a table is new, or small, you can add split points and generate new tablets. </p>
 <p>In the shell: </p>

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.html
==============================================================================
--- websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.html
(original)
+++ websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/Table_Design.html
Tue Nov 15 20:53:16 2011
@@ -164,7 +164,7 @@
 <p>Appending dates provides the additional capability of restricting a scan to a given
date range. </p>
 <h2 id="a_idindexinga_indexing"><a id=Indexing></a> Indexing</h2>
 <p>In order to support lookups via more than one attribute of an entity, additional
indexes can be built. However, because Accumulo tables can support any number of columns without
specifying them beforehand, a single additional index will often suffice for supporting lookups
of records in the main table. Here, the index has, as the rowID, the Value or Term from the
main table, the column families are the same, and the column qualifier of the index table
contains the rowID from the main table. </p>
-<p><img alt="converted table" src="img2.png" /></p>
+<p><img alt="converted table" src="img4.png" /></p>
 <p>Note: We store rowIDs in the column qualifier rather than the Value so that we can
have more than one rowID associated with a particular term within the index. If we stored
this in the Value we would only see one of the rows in which the value appears since Accumulo
is configured by default to return the one most recent value associated with a key. </p>
 <p>Lookups can then be done by scanning the Index Table first for occurrences of the
desired values in the columns specified, which returns a list of row ID from the main table.
These can then be used to retrieve each matching record, in their entirety, or a subset of
their columns, from the Main Table. </p>
 <p>To support efficient lookups of multiple rowIDs from the same table, the Accumulo
client library provides a BatchScanner. Users specify a set of Ranges to the BatchScanner,
which performs the lookups in multiple threads to multiple servers and returns an Iterator
over all the rows retrieved. The rows returned are NOT in sorted order, as is the case with
the basic Scanner interface. </p>
@@ -197,9 +197,9 @@
 <p>Accumulo is ideal for storing entities and their attributes, especially of the attributes
are sparse. It is often useful to join several datasets together on common entities within
the same table. This can allow for the representation of graphs, including nodes, their attributes,
and connections to other nodes. </p>
 <p>Rather than storing individual events, Entity-Attribute or Graph tables store aggregate
information about the entities involved in the events and the relationships between entities.
This is often preferrable when single events aren't very useful and when a continuously updated
summarization is desired. </p>
 <p>The physical schema for an entity-attribute or graph table is as follows: </p>
-<p><img alt="converted table" src="img3.png" /></p>
+<p><img alt="converted table" src="img5.png" /></p>
 <p>For example, to keep track of employees, managers and products the following entity-attribute
table could be used. Note that the weights are not always necessary and are set to 0 when
not used. </p>
-<p><img alt="converted table" src="img4.png" /> <br />
+<p><img alt="converted table" src="img6.png" /> <br />
 </p>
 <p>To allow efficient updating of edge weights, an aggregating iterator can be configured
to add the value of all mutations applied with the same key. These types of tables can easily
be created from raw events by simply extracting the entities, attributes, and relationships
from individual events and inserting the keys into Accumulo each with a count of 1. The aggregating
iterator will take care of maintaining the edge weights. </p>
 <h2 id="a_iddocument-partitioned_indexinga_document-partitioned_indexing"><a id=Document-Partitioned_Indexing></a>
Document-Partitioned Indexing</h2>
@@ -207,7 +207,7 @@
 <p>First is that the set of all records matching any one of the search terms must be
sent to the client, which incurs a lot of network traffic. The second problem is that the
client is responsible for performing set intersection on the sets of records returned to eliminate
all but the records matching all search terms. The memory of the client may easily be overwhelmed
during this operation. </p>
 <p>For these reasons Accumulo includes support for a scheme known as sharded indexing,
in which these set operations can be performed at the TabletServers and decisions about which
records to include in the result set can be made without incurring network traffic. </p>
 <p>This is accomplished via partitioning records into bins that each reside on at most
one TabletServer, and then creating an index of terms per record within each bin as follows:
</p>
-<p><img alt="converted table" src="img5.png" /></p>
+<p><img alt="converted table" src="img7.png" /></p>
 <p>Documents or records are mapped into bins by a user-defined ingest application.
By storing the BinID as the RowID we ensure that all the information for a particular bin
is contained in a single tablet and hosted on a single TabletServer since Accumulo never splits
rows across tablets. Storing the Terms as column families serves to enable fast lookups of
all the documents within this bin that contain the given term. </p>
 <p>Finally, we perform set intersection operations on the TabletServer via a special
iterator called the Intersecting Iterator. Since documents are partitioned into many bins,
a search of all documents must search every bin. We can use the BatchScanner to scan all bins
in parallel. The Intersecting Iterator should be enabled on a BatchScanner within user query
code as follows: </p>
 <div class="codehilite"><pre><span class="n">Text</span><span
class="o">[]</span> <span class="n">terms</span> <span class="o">=</span>
<span class="p">{</span><span class="k">new</span> <span class="n">Text</span><span
class="p">(</span><span class="s">&quot;the&quot;</span><span
class="p">),</span> <span class="k">new</span> <span class="n">Text</span><span
class="p">(</span><span class="s">&quot;white&quot;</span><span
class="p">),</span> <span class="k">new</span> <span class="n">Text</span><span
class="p">(</span><span class="s">&quot;house&quot;</span><span
class="p">)};</span>

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img2.png
==============================================================================
Binary files - no diff available.

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img3.png
==============================================================================
Binary files - no diff available.

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img4.png
==============================================================================
Binary files - no diff available.

Modified: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img5.png
==============================================================================
Binary files - no diff available.

Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img6.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img6.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Added: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img7.png
==============================================================================
Binary file - no diff available.

Propchange: websites/staging/accumulo/trunk/content/accumulo/user_manual_1.4-incubating/img7.png
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream



Mime
View raw message