hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dm...@apache.org
Subject svn commit: r1242427 - in /hbase/trunk/src/docbkx: book.xml configuration.xml
Date Thu, 09 Feb 2012 18:09:45 GMT
Author: dmeil
Date: Thu Feb  9 18:09:45 2012
New Revision: 1242427

URL: http://svn.apache.org/viewvc?rev=1242427&view=rev
Log:
hbase-5365.  book - Arch/Region/Store adding description of compaction file selection 

Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/configuration.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1242427&r1=1242426&r2=1242427&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Thu Feb  9 18:09:45 2012
@@ -283,7 +283,8 @@ try {
         <para>HBase does not modify data in place, and so deletes are handled by creating
new markers called <emphasis>tombstones</emphasis>.
         These tombstones, along with the dead values, are cleaned up on major compactions.
         </para>
-        <para>See <xref linkend="version.delete"/> for more information on deleting
versions of columns.         
+        <para>See <xref linkend="version.delete"/> for more information on deleting
versions of columns, and see 
+        <xref linkend="compaction"/> for more information on compactions.         
         </para>
  
       </section>
@@ -588,10 +589,10 @@ admin.enableTable(table);               
       HBase currently does not do well with anything above two or three column families so
keep the number
       of column families in your schema low.  Currently, flushing and compactions are done
on a per Region basis so
       if one column family is carrying the bulk of the data bringing on flushes, the adjacent
families
-      will also be flushed though the amount of data they carry is small.  Compaction is
currently triggered
-      by the total number of files under a column family.  Its not size based.  When many
column families the
+      will also be flushed though the amount of data they carry is small.  When many column
families the
       flushing and compaction interaction can make for a bunch of needless i/o loading (To
be addressed by
-      changing flushing and compaction to work on a per column family basis).
+      changing flushing and compaction to work on a per column family basis).  For more information

+      on compactions, see <xref linkend="compaction"/>.
     </para>
     <para>Try to make do with one column family if you can in your schemas.  Only introduce
a
         second and third column family in the case where data access is usually column scoped;
@@ -2136,16 +2137,133 @@ myHtd.setValue(HTableDescriptor.SPLIT_PO
       <section xml:id="compaction">
         <title>Compaction</title>
         <para>There are two types of compactions:  minor and major.  Minor compactions
will usually pick up a couple of the smaller adjacent
-         files and rewrite them as one.  Minors do not drop deletes or expired cells, only
major compactions do this.  Sometimes a minor compaction
-         will pick up all  the files in the store and in this case it actually promotes itself
to being a major compaction.  
-         For a description of how a minor compaction picks files to compact, see the <link
xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">ascii
diagram in the Store source code.</link>
+         StoreFiles and rewrite them as one.  Minors do not drop deletes or expired cells,
only major compactions do this.  Sometimes a minor compaction
+         will pick up all the StoreFiles in the Store and in this case it actually promotes
itself to being a major compaction.  
          </para>
-         <para>After a major compaction runs there will be a single storefile per store,
and this will help performance usually.  Caution:  major compactions rewrite all of the stores
data and on a loaded system, this may not be tenable;
+         <para>After a major compaction runs there will be a single StoreFile per Store,
and this will help performance usually.  Caution:  major compactions rewrite all of the Stores
data and on a loaded system, this may not be tenable;
              major compactions will usually have to be done manually on large systems.  See
<xref linkend="managed.compactions" />.
         </para>
         <para>Compactions will <emphasis>not</emphasis> perform region
merges.  See <xref linkend="ops.regionmgt.merge"/> for more information on region merging.
         </para>
-      </section>
+        <section xml:id="compaction.file.selection">
+          <title>Compaction File Selection</title>
+          <para>To understand the core algorithm for StoreFile selection, there is
some ASCII-art in the <link xlink:href="http://hbase.apache.org/xref/org/apache/hadoop/hbase/regionserver/Store.html#836">Store
source code</link> that 
+          will serve as useful reference.  It has been copied below:
+<programlisting>
+/* normal skew:
+ *
+ *         older ----> newer
+ *     _
+ *    | |   _
+ *    | |  | |   _
+ *  --|-|- |-|- |-|---_-------_-------  minCompactSize
+ *    | |  | |  | |  | |  _  | |
+ *    | |  | |  | |  | | | | | |
+ *    | |  | |  | |  | | | | | |
+ */
+</programlisting>
+          Important knobs:
+          <itemizedlist>
+            <listitem><code>hbase.store.compaction.ratio</code> Ratio used
in compaction
+            file selection algorithm.  (default 1.2F) </listitem>
+            <listitem><code>hbase.hstore.compaction.min</code> (.90 hbase.hstore.compactionThreshold)
(files) Minimum number
+            of StoreFiles per Store to be selected for a compaction to occur.</listitem>
+            <listitem><code>hbase.hstore.compaction.max</code> (files)
Maximum number of StoreFiles to compact per minor compaction.</listitem>
+            <listitem><code>hbase.hstore.compaction.min.size</code> (bytes)

+            Any StoreFile smaller than this setting with automatically be a candidate for
compaction.  Defaults to 
+            regions' memstore flush size (134 mb). </listitem>
+            <listitem><code>hbase.hstore.compaction.max.size</code> (.92)
(bytes) 
+            Any StoreFile larger than this setting with automatically be excluded from compaction.
</listitem>
+            </itemizedlist>
+          </para>
+          <para>The minor compaction StoreFile selection logic is size based, and selects
a file for compaction when the file
+           &lt;= sum(smaller_files) * <code>hbase.hstore.compaction.ratio</code>.
+          </para>                
+        </section>
+        <section xml:id="compaction.file.selection.example1">
+          <title>Minor Compaction File Selection - Example #1 (Basic Example)</title>
+          <para>This example mirrors an example from the unit test <code>TestCompactSelection</code>.
+          <itemizedlist>
+            <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+            <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+            <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>        
+            <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+            <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+          </itemizedlist>
+          The following StoreFiles exist: 100, 50, 23, 12, and 12 bytes apiece (oldest to
newest).
+          With the above parameters, the files that would be selected for minor compaction
are 23, 12, and 12.
+          </para>           
+          <para>Why?
+          <itemizedlist>
+            <listitem>100 --&gt;  No, because sum(50, 23, 12, 12) * 1.0 = 97. </listitem>
+            <listitem>50 --&gt;  No, because sum(23, 12, 12) * 1.0 = 47. </listitem>
+            <listitem>23 --&gt;  Yes, because sum(12, 12) * 1.0 = 24. </listitem>
+            <listitem>12 --&gt;  Yes, because sum(12) * 1.0 = 12. </listitem>
+            <listitem>12 --&gt;  Yes, because the previous file had been included,
and this is included because this 
+          does not exceed the the max-file limit of 5.</listitem>
+          </itemizedlist>
+          </para>
+        </section>
+        <section xml:id="compaction.file.selection.example2">
+          <title>Minor Compaction File Selection - Example #2 (Not Enough Files To
Compact)</title>
+          <para>This example mirrors an example from the unit test <code>TestCompactSelection</code>.
+          <itemizedlist>
+            <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+            <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+            <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>        
+            <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+            <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+          </itemizedlist>
+          </para>          
+          <para>The following StoreFiles exist: 100, 25, 12, and 12 bytes apiece (oldest
to newest).
+          With the above parameters, the files that would be selected for minor compaction
are 23, 12, and 12.         
+          </para>  
+          <para>Why?
+          <itemizedlist>
+            <listitem>100 --&gt; No, because sum(25, 12, 12) * 1.0 = 47</listitem>
+            <listitem>25 --&gt;  No, because sum(12, 12) * 1.0 = 24</listitem>
+            <listitem>12 --&gt;  No. Candidate because sum(12) * 1.0 = 12, there
are only 2 files to compact and that is less than the threshold of 3</listitem> 
+            <listitem>12 --&gt;  No. Candidate because the previous StoreFile was,
but there are not enough files to compact</listitem>
+          </itemizedlist>
+          </para>
+        </section>
+        <section xml:id="compaction.file.selection.example2">
+          <title>Minor Compaction File Selection - Example #3 (Limiting Files To Compact)</title>
+          <para>This example mirrors an example from the unit test <code>TestCompactSelection</code>.
+          <itemizedlist>
+            <listitem><code>hbase.store.compaction.ratio</code> = 1.0F
</listitem>
+            <listitem><code>hbase.hstore.compaction.min</code> = 3 (files)
</listitem>>
+            <listitem><code>hbase.hstore.compaction.max</code> = 5 (files)
</listitem>>        
+            <listitem><code>hbase.hstore.compaction.min.size</code> = 10
(bytes) </listitem>>
+            <listitem><code>hbase.hstore.compaction.max.size</code> = 1000
(bytes) </listitem>>
+          </itemizedlist>
+          The following StoreFiles exist: 7, 6, 5, 4, 3, 2, and 1 bytes apiece (oldest to
newest).
+          With the above parameters, the files that would be selected for minor compaction
are 7, 6, 5, 4, 3.         
+          </para>  
+          <para>Why?
+          <itemizedlist>
+            <listitem>7 --&gt;  Yes, because sum(6, 5, 4, 3, 2, 1) * 1.0 = 21.
 Also, 7 is less than the min-size</listitem>
+            <listitem>6 --&gt;  Yes, because sum(5, 4, 3, 2, 1) * 1.0 = 15.  Also,
6 is less than the min-size. </listitem>
+            <listitem>5 --&gt;  Yes, because sum(4, 3, 2, 1) * 1.0 = 10.  Also,
5 is less than the min-size. </listitem>
+            <listitem>4 --&gt;  Yes, because sum(3, 2, 1) * 1.0 = 6.  Also, 4 is
less than the min-size. </listitem>
+            <listitem>3 --&gt;  Yes, because sum(2, 1) * 1.0 = 3.  Also, 3 is less
than the min-size. </listitem>
+            <listitem>2 --&gt;  No.  Also, 2 is less than the min-size, the max-number
of files to compact has been reached. </listitem>
+            <listitem>1 --&gt;  No.  Also, 1 is less than the min-size, the max-number
of files to compact has been reached. </listitem>
+          </itemizedlist>
+          </para>
+        </section>
+        <section xml:id="compaction.config.impact">
+          <title>Impact of Key Configuration Options</title>
+          <para><code>hbase.store.compaction.ratio</code>.  A large ratio
(e.g., 10F) will produce a single giant file.  Conversely, a value of .25F will
+          produce behavior similar to the BigTable compaction algorithm - resulting in 4
StoreFiles.
+          </para>
+          <para><code>hbase.hstore.compaction.min.size</code>.  This defaults
to <code>hbase.hregion.memstore.flush.size</code> (134 mb).  Because
+          this limit represents the "automatic include" limit for all StoreFiles smaller
than this value, this value may need to
+          be adjusted downwards in write-heavy environments where many 1 or 2 mb StoreFiles
are being flushed, because every file
+          will be targeted for compaction, and the resulting files may still be under the
min-size and require further compaction, etc. 
+          </para>
+        </section>
+      </section>  <!--  compaction -->
 
      </section>  <!--  store -->
       

Modified: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1242427&r1=1242426&r2=1242427&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Thu Feb  9 18:09:45 2012
@@ -1569,6 +1569,7 @@ of all regions.
       they occur.  They can be administered through the HBase shell, or via 
       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29">HBaseAdmin</link>.
       </para>
+      <para>For more information about compactions and the compaction file selection
process, see <xref linkend="compaction"/></para>
       </section>
       
       </section>



Mime
View raw message