hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From e...@apache.org
Subject svn commit: r1462679 [12/14] - in /hbase/hbase.apache.org/trunk: ./ book/ case_studies/ community/ configuration/ developer/ getting_started/ ops_mgt/ performance/ rpc/
Date Sat, 30 Mar 2013 00:19:57 GMT
Modified: hbase/hbase.apache.org/trunk/performance.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/performance.html?rev=1462679&r1=1462678&r2=1462679&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/performance.html (original)
+++ hbase/hbase.apache.org/trunk/performance.html Sat Mar 30 00:19:55 2013
@@ -1,10 +1,10 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning</title><link rel="stylesheet" type="text/css" href="css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="chapter" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><div class="titlepage"><div><div><h2 class="title"><a name="performance"></a>Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#perf.os">1.1. Operating System</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.os.ram">1.1.1. Memory</a></span></dt><dt><span class="section"><a href="#perf.os.64">1.1.2. 64-bit</a></span></dt><dt><span class="section"><a href="#perf.os.swap">1.1.3. Swapping</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.network">1.2. 
 Network</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.network.1switch">1.2.1. Single Switch</a></span></dt><dt><span class="section"><a href="#perf.network.2switch">1.2.2. Multiple Switches</a></span></dt><dt><span class="section"><a href="#perf.network.multirack">1.2.3. Multiple Racks</a></span></dt><dt><span class="section"><a href="#perf.network.ints">1.2.4. Network Interfaces</a></span></dt></dl></dd><dt><span class="section"><a href="#jvm">1.3. Java</a></span></dt><dd><dl><dt><span class="section"><a href="#gc">1.3.1. The Garbage Collector and Apache HBase</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.configurations">1.4. HBase Configurations</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.number.of.regions">1.4.1. Number of Regions</a></span></dt><dt><span class="section"><a href="#perf.compactions.and.splits">1.4.2. Managing Compactions</a></span></dt><dt><span class="section"><a href="#perf.handlers">1.4.3. <code 
 class="varname">hbase.regionserver.handler.count</code></a></span></dt><dt><span class="section"><a href="#perf.hfile.block.cache.size">1.4.4. <code class="varname">hfile.block.cache.size</code></a></span></dt><dt><span class="section"><a href="#perf.rs.memstore.upperlimit">1.4.5. <code class="varname">hbase.regionserver.global.memstore.upperLimit</code></a></span></dt><dt><span class="section"><a href="#perf.rs.memstore.lowerlimit">1.4.6. <code class="varname">hbase.regionserver.global.memstore.lowerLimit</code></a></span></dt><dt><span class="section"><a href="#perf.hstore.blockingstorefiles">1.4.7. <code class="varname">hbase.hstore.blockingStoreFiles</code></a></span></dt><dt><span class="section"><a href="#perf.hregion.memstore.block.multiplier">1.4.8. <code class="varname">hbase.hregion.memstore.block.multiplier</code></a></span></dt><dt><span class="section"><a href="#hbase.regionserver.checksum.verify">1.4.9. <code class="varname">hbase.regionserver.checksum.verify</
 code></a></span></dt></dl></dd><dt><span class="section"><a href="#perf.zookeeper">1.5. ZooKeeper</a></span></dt><dt><span class="section"><a href="#perf.schema">1.6. Schema Design</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.number.of.cfs">1.6.1. Number of Column Families</a></span></dt><dt><span class="section"><a href="#perf.schema.keys">1.6.2. Key and Attribute Lengths</a></span></dt><dt><span class="section"><a href="#schema.regionsize">1.6.3. Table RegionSize</a></span></dt><dt><span class="section"><a href="#schema.bloom">1.6.4. Bloom Filters</a></span></dt><dt><span class="section"><a href="#schema.cf.blocksize">1.6.5. ColumnFamily BlockSize</a></span></dt><dt><span class="section"><a href="#cf.in.memory">1.6.6. In-Memory ColumnFamilies</a></span></dt><dt><span class="section"><a href="#perf.compression">1.6.7. Compression</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.writing">1.7. Writing to HBase</a></span></dt><dd><dl><dt><sp
 an class="section"><a href="#perf.batch.loading">1.7.1. Batch Loading</a></span></dt><dt><span class="section"><a href="#precreate.regions">1.7.2. 
+   <title>Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning</title><link rel="stylesheet" type="text/css" href="css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="chapter" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><div class="titlepage"><div><div><h2 class="title"><a name="performance"></a>Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning</h2></div></div></div><div class="toc"><p><b>Table of Contents</b></p><dl><dt><span class="section"><a href="#perf.os">1.1. Operating System</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.os.ram">1.1.1. Memory</a></span></dt><dt><span class="section"><a href="#perf.os.64">1.1.2. 64-bit</a></span></dt><dt><span class="section"><a href="#perf.os.swap">1.1.3. Swapping</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.network">1.2. 
 Network</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.network.1switch">1.2.1. Single Switch</a></span></dt><dt><span class="section"><a href="#perf.network.2switch">1.2.2. Multiple Switches</a></span></dt><dt><span class="section"><a href="#perf.network.multirack">1.2.3. Multiple Racks</a></span></dt><dt><span class="section"><a href="#perf.network.ints">1.2.4. Network Interfaces</a></span></dt></dl></dd><dt><span class="section"><a href="#jvm">1.3. Java</a></span></dt><dd><dl><dt><span class="section"><a href="#gc">1.3.1. The Garbage Collector and Apache HBase</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.configurations">1.4. HBase Configurations</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.number.of.regions">1.4.1. Number of Regions</a></span></dt><dt><span class="section"><a href="#perf.compactions.and.splits">1.4.2. Managing Compactions</a></span></dt><dt><span class="section"><a href="#perf.handlers">1.4.3. <code 
 class="varname">hbase.regionserver.handler.count</code></a></span></dt><dt><span class="section"><a href="#perf.hfile.block.cache.size">1.4.4. <code class="varname">hfile.block.cache.size</code></a></span></dt><dt><span class="section"><a href="#perf.rs.memstore.upperlimit">1.4.5. <code class="varname">hbase.regionserver.global.memstore.upperLimit</code></a></span></dt><dt><span class="section"><a href="#perf.rs.memstore.lowerlimit">1.4.6. <code class="varname">hbase.regionserver.global.memstore.lowerLimit</code></a></span></dt><dt><span class="section"><a href="#perf.hstore.blockingstorefiles">1.4.7. <code class="varname">hbase.hstore.blockingStoreFiles</code></a></span></dt><dt><span class="section"><a href="#perf.hregion.memstore.block.multiplier">1.4.8. <code class="varname">hbase.hregion.memstore.block.multiplier</code></a></span></dt><dt><span class="section"><a href="#hbase.regionserver.checksum.verify">1.4.9. <code class="varname">hbase.regionserver.checksum.verify</
 code></a></span></dt></dl></dd><dt><span class="section"><a href="#perf.zookeeper">1.5. ZooKeeper</a></span></dt><dt><span class="section"><a href="#perf.schema">1.6. Schema Design</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.number.of.cfs">1.6.1. Number of Column Families</a></span></dt><dt><span class="section"><a href="#perf.schema.keys">1.6.2. Key and Attribute Lengths</a></span></dt><dt><span class="section"><a href="#schema.regionsize">1.6.3. Table RegionSize</a></span></dt><dt><span class="section"><a href="#schema.bloom">1.6.4. Bloom Filters</a></span></dt><dt><span class="section"><a href="#schema.cf.blocksize">1.6.5. ColumnFamily BlockSize</a></span></dt><dt><span class="section"><a href="#cf.in.memory">1.6.6. In-Memory ColumnFamilies</a></span></dt><dt><span class="section"><a href="#perf.compression">1.6.7. Compression</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.general">1.7. HBase General Patterns</a></span></dt><dd><dl><
 dt><span class="section"><a href="#perf.general.constants">1.7.1. Constants</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.writing">1.8. Writing to HBase</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.batch.loading">1.8.1. Batch Loading</a></span></dt><dt><span class="section"><a href="#precreate.regions">1.8.2. 
     Table Creation: Pre-Creating Regions
-    </a></span></dt><dt><span class="section"><a href="#def.log.flush">1.7.3. 
+    </a></span></dt><dt><span class="section"><a href="#def.log.flush">1.8.3. 
     Table Creation: Deferred Log Flush
-    </a></span></dt><dt><span class="section"><a href="#perf.hbase.client.autoflush">1.7.4. HBase Client:  AutoFlush</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.putwal">1.7.5. HBase Client:  Turn off WAL on Puts</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.regiongroup">1.7.6. HBase Client: Group Puts by RegionServer</a></span></dt><dt><span class="section"><a href="#perf.hbase.write.mr.reducer">1.7.7. MapReduce:  Skip The Reducer</a></span></dt><dt><span class="section"><a href="#perf.one.region">1.7.8. Anti-Pattern:  One Hot Region</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.reading">1.8. Reading from HBase</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.hbase.client.caching">1.8.1. Scan Caching</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.selection">1.8.2. Scan Attribute Selection</a></span></dt><dt><span class="section"><a href="#perf.hbase.mr.input">1.8.3. MapRe
 duce - Input Splits</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.scannerclose">1.8.4. Close ResultScanners</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.blockcache">1.8.5. Block Cache</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.rowkeyonly">1.8.6. Optimal Loading of Row Keys</a></span></dt><dt><span class="section"><a href="#perf.hbase.read.dist">1.8.7. Concurrency:  Monitor Data Spread</a></span></dt><dt><span class="section"><a href="#blooms">1.8.8. Bloom Filters</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.deleting">1.9. Deleting from HBase</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.deleting.queue">1.9.1. Using HBase Tables as Queues</a></span></dt><dt><span class="section"><a href="#perf.deleting.rpc">1.9.2. Delete RPC Behavior</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.hdfs">1.10. HDFS</a></span></dt><dd><dl><dt><span class="section"><a
  href="#perf.hdfs.curr">1.10.1. Current Issues With Low-Latency Reads</a></span></dt><dt><span class="section"><a href="#perf.hdfs.configs.localread">1.10.2. Leveraging local data</a></span></dt><dt><span class="section"><a href="#perf.hdfs.comp">1.10.3. Performance Comparisons of HBase vs. HDFS</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.ec2">1.11. Amazon EC2</a></span></dt><dt><span class="section"><a href="#perf.casestudy">1.12. Case Studies</a></span></dt></dl></div><div class="section" title="1.1.&nbsp;Operating System"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.os"></a>1.1.&nbsp;Operating System</h2></div></div></div><div class="section" title="1.1.1.&nbsp;Memory"><div class="titlepage"><div><div><h3 class="title"><a name="perf.os.ram"></a>1.1.1.&nbsp;Memory</h3></div></div></div><p>RAM, RAM, RAM.  Don't starve HBase.</p></div><div class="section" title="1.1.2.&nbsp;64-bit"><div class="titlepage"><div><div
 ><h3 class="title"><a name="perf.os.64"></a>1.1.2.&nbsp;64-bit</h3></div></div></div><p>Use a 64-bit platform (and 64-bit JVM).</p></div><div class="section" title="1.1.3.&nbsp;Swapping"><div class="titlepage"><div><div><h3 class="title"><a name="perf.os.swap"></a>1.1.3.&nbsp;Swapping</h3></div></div></div><p>Watch out for swapping.  Set swappiness to 0.</p></div></div><div class="section" title="1.2.&nbsp;Network"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.network"></a>1.2.&nbsp;Network</h2></div></div></div><p>
+    </a></span></dt><dt><span class="section"><a href="#perf.hbase.client.autoflush">1.8.4. HBase Client:  AutoFlush</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.putwal">1.8.5. HBase Client:  Turn off WAL on Puts</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.regiongroup">1.8.6. HBase Client: Group Puts by RegionServer</a></span></dt><dt><span class="section"><a href="#perf.hbase.write.mr.reducer">1.8.7. MapReduce:  Skip The Reducer</a></span></dt><dt><span class="section"><a href="#perf.one.region">1.8.8. Anti-Pattern:  One Hot Region</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.reading">1.9. Reading from HBase</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.hbase.client.caching">1.9.1. Scan Caching</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.selection">1.9.2. Scan Attribute Selection</a></span></dt><dt><span class="section"><a href="#perf.hbase.mr.input">1.9.3. MapRe
 duce - Input Splits</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.scannerclose">1.9.4. Close ResultScanners</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.blockcache">1.9.5. Block Cache</a></span></dt><dt><span class="section"><a href="#perf.hbase.client.rowkeyonly">1.9.6. Optimal Loading of Row Keys</a></span></dt><dt><span class="section"><a href="#perf.hbase.read.dist">1.9.7. Concurrency:  Monitor Data Spread</a></span></dt><dt><span class="section"><a href="#blooms">1.9.8. Bloom Filters</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.deleting">1.10. Deleting from HBase</a></span></dt><dd><dl><dt><span class="section"><a href="#perf.deleting.queue">1.10.1. Using HBase Tables as Queues</a></span></dt><dt><span class="section"><a href="#perf.deleting.rpc">1.10.2. Delete RPC Behavior</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.hdfs">1.11. HDFS</a></span></dt><dd><dl><dt><span class="section"
 ><a href="#perf.hdfs.curr">1.11.1. Current Issues With Low-Latency Reads</a></span></dt><dt><span class="section"><a href="#perf.hdfs.configs.localread">1.11.2. Leveraging local data</a></span></dt><dt><span class="section"><a href="#perf.hdfs.comp">1.11.3. Performance Comparisons of HBase vs. HDFS</a></span></dt></dl></dd><dt><span class="section"><a href="#perf.ec2">1.12. Amazon EC2</a></span></dt><dt><span class="section"><a href="#perf.casestudy">1.13. Case Studies</a></span></dt></dl></div><div class="section" title="1.1.&nbsp;Operating System"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.os"></a>1.1.&nbsp;Operating System</h2></div></div></div><div class="section" title="1.1.1.&nbsp;Memory"><div class="titlepage"><div><div><h3 class="title"><a name="perf.os.ram"></a>1.1.1.&nbsp;Memory</h3></div></div></div><p>RAM, RAM, RAM.  Don't starve HBase.</p></div><div class="section" title="1.1.2.&nbsp;64-bit"><div class="titlepage"><div><
 div><h3 class="title"><a name="perf.os.64"></a>1.1.2.&nbsp;64-bit</h3></div></div></div><p>Use a 64-bit platform (and 64-bit JVM).</p></div><div class="section" title="1.1.3.&nbsp;Swapping"><div class="titlepage"><div><div><h3 class="title"><a name="perf.os.swap"></a>1.1.3.&nbsp;Swapping</h3></div></div></div><p>Watch out for swapping.  Set swappiness to 0.</p></div></div><div class="section" title="1.2.&nbsp;Network"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.network"></a>1.2.&nbsp;Network</h2></div></div></div><p>
     Perhaps the most important factor in avoiding network issues degrading Hadoop and HBbase performance is the switching hardware
     that is used, decisions made early in the scope of the project can cause major problems when you double or triple the size of your cluster (or more).
     </p><p>
@@ -80,7 +80,7 @@
         on each insert. If <code class="varname">ROWCOL</code>, the hash of the row +
         column family + column family qualifier will be added to the bloom on
         each key insert.</p><p>See <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HColumnDescriptor.html" target="_top">HColumnDescriptor</a> and
-    <a class="xref" href="#blooms" title="1.8.8.&nbsp;Bloom Filters">Section&nbsp;1.8.8, &#8220;Bloom Filters&#8221;</a> for more information or this answer up in quora,
+    <a class="xref" href="#blooms" title="1.9.8.&nbsp;Bloom Filters">Section&nbsp;1.9.8, &#8220;Bloom Filters&#8221;</a> for more information or this answer up in quora,
 <a class="link" href="http://www.quora.com/How-are-bloom-filters-used-in-HBase" target="_top">How are bloom filters used in HBase?</a>.
     </p></div><div class="section" title="1.6.5.&nbsp;ColumnFamily BlockSize"><div class="titlepage"><div><div><h3 class="title"><a name="schema.cf.blocksize"></a>1.6.5.&nbsp;ColumnFamily BlockSize</h3></div></div></div><p>The blocksize can be configured for each ColumnFamily in a table, and this defaults to 64k.  Larger cell values require larger blocksizes.
     There is an inverse relationship between blocksize and the resulting StoreFile indexes (i.e., if the blocksize is doubled then the resulting
@@ -97,10 +97,27 @@
          So while using ColumnFamily compression is a best practice, but it's not going to completely eliminate
          the impact of over-sized Keys, over-sized ColumnFamily names, or over-sized Column names.
          </p><p>See <a class="xref" href="#">???</a> on for schema design tips, and <a class="xref" href="#">???</a> for more information on HBase stores data internally.
-         </p></div></div></div><div class="section" title="1.7.&nbsp;Writing to HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.writing"></a>1.7.&nbsp;Writing to HBase</h2></div></div></div><div class="section" title="1.7.1.&nbsp;Batch Loading"><div class="titlepage"><div><div><h3 class="title"><a name="perf.batch.loading"></a>1.7.1.&nbsp;Batch Loading</h3></div></div></div><p>Use the bulk load tool if you can.  See
+         </p></div></div></div><div class="section" title="1.7.&nbsp;HBase General Patterns"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.general"></a>1.7.&nbsp;HBase General Patterns</h2></div></div></div><div class="section" title="1.7.1.&nbsp;Constants"><div class="titlepage"><div><div><h3 class="title"><a name="perf.general.constants"></a>1.7.1.&nbsp;Constants</h3></div></div></div><p>When people get started with HBase they have a tendency to write code that looks like this:
+</p><pre class="programlisting">
+Get get = new Get(rowkey);
+Result r = htable.get(get);
+byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value
+</pre><p>
+		But especially when inside loops (and MapReduce jobs), converting the columnFamily and column-names
+		to byte-arrays repeatedly is surprisingly expensive.
+		It's better to use constants for the byte-arrays, like this:
+</p><pre class="programlisting">
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(rowkey);
+Result r = htable.get(get);
+byte[] b = r.getValue(CF, ATTR);  // returns current version of value
+</pre><p>
+      </p></div></div><div class="section" title="1.8.&nbsp;Writing to HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.writing"></a>1.8.&nbsp;Writing to HBase</h2></div></div></div><div class="section" title="1.8.1.&nbsp;Batch Loading"><div class="titlepage"><div><div><h3 class="title"><a name="perf.batch.loading"></a>1.8.1.&nbsp;Batch Loading</h3></div></div></div><p>Use the bulk load tool if you can.  See
         <a class="xref" href="#">???</a>.
         Otherwise, pay attention to the below.
-      </p></div><div class="section" title="1.7.2.&nbsp; Table Creation: Pre-Creating Regions"><div class="titlepage"><div><div><h3 class="title"><a name="precreate.regions"></a>1.7.2.&nbsp;
+      </p></div><div class="section" title="1.8.2.&nbsp; Table Creation: Pre-Creating Regions"><div class="titlepage"><div><div><h3 class="title"><a name="precreate.regions"></a>1.8.2.&nbsp;
     Table Creation: Pre-Creating Regions
     </h3></div></div></div><p>
 Tables in HBase are initially created with one region by default.  For bulk imports, this means that all clients will write to the same region 
@@ -119,7 +136,7 @@ byte[][] splits = ...;   // create your 
 admin.createTable(table, splits);
 </pre><p>
    See <a class="xref" href="#">???</a> for issues related to understanding your keyspace and pre-creating regions.
-  </p></div><div class="section" title="1.7.3.&nbsp; Table Creation: Deferred Log Flush"><div class="titlepage"><div><div><h3 class="title"><a name="def.log.flush"></a>1.7.3.&nbsp;
+  </p></div><div class="section" title="1.8.3.&nbsp; Table Creation: Deferred Log Flush"><div class="titlepage"><div><div><h3 class="title"><a name="def.log.flush"></a>1.8.3.&nbsp;
     Table Creation: Deferred Log Flush
     </h3></div></div></div><p>
 The default behavior for Puts using the Write Ahead Log (WAL) is that <code class="classname">HLog</code> edits will be written immediately.  If deferred log flush is used,
@@ -127,7 +144,7 @@ WAL edits are kept in memory until the f
  the RegionServer goes down the yet-to-be-flushed edits are lost.  This is safer, however, than not using WAL at all with Puts.
 </p><p>
 Deferred log flush can be configured on tables via <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html" target="_top">HTableDescriptor</a>.  The default value of <code class="varname">hbase.regionserver.optionallogflushinterval</code> is 1000ms.
-</p></div><div class="section" title="1.7.4.&nbsp;HBase Client: AutoFlush"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.autoflush"></a>1.7.4.&nbsp;HBase Client:  AutoFlush</h3></div></div></div><p>When performing a lot of Puts, make sure that setAutoFlush is set
+</p></div><div class="section" title="1.8.4.&nbsp;HBase Client: AutoFlush"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.autoflush"></a>1.8.4.&nbsp;HBase Client:  AutoFlush</h3></div></div></div><p>When performing a lot of Puts, make sure that setAutoFlush is set
       to false on your <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html" target="_top">HTable</a>
       instance. Otherwise, the Puts will be sent one at a time to the
       RegionServer. Puts added via <code class="code"> htable.add(Put)</code> and <code class="code"> htable.add( &lt;List&gt; Put)</code>
@@ -135,31 +152,31 @@ Deferred log flush can be configured on 
       these messages are not sent until the write-buffer is filled. To
       explicitly flush the messages, call <code class="methodname">flushCommits</code>.
       Calling <code class="methodname">close</code> on the <code class="classname">HTable</code>
-      instance will invoke <code class="methodname">flushCommits</code>.</p></div><div class="section" title="1.7.5.&nbsp;HBase Client: Turn off WAL on Puts"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.putwal"></a>1.7.5.&nbsp;HBase Client:  Turn off WAL on Puts</h3></div></div></div><p>A frequently discussed option for increasing throughput on <code class="classname">Put</code>s is to call <code class="code">writeToWAL(false)</code>.  Turning this off means
+      instance will invoke <code class="methodname">flushCommits</code>.</p></div><div class="section" title="1.8.5.&nbsp;HBase Client: Turn off WAL on Puts"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.putwal"></a>1.8.5.&nbsp;HBase Client:  Turn off WAL on Puts</h3></div></div></div><p>A frequently discussed option for increasing throughput on <code class="classname">Put</code>s is to call <code class="code">writeToWAL(false)</code>.  Turning this off means
           that the RegionServer will <span class="emphasis"><em>not</em></span> write the <code class="classname">Put</code> to the Write Ahead Log,
           only into the memstore, HOWEVER the consequence is that if there
           is a RegionServer failure <span class="emphasis"><em>there will be data loss</em></span>.
           If <code class="code">writeToWAL(false)</code> is used, do so with extreme caution.  You may find in actuality that
           it makes little difference if your load is well distributed across the cluster.
       </p><p>In general, it is best to use WAL for Puts, and where loading throughput
-          is a concern to use <a class="link" href="#perf.batch.loading" title="1.7.1.&nbsp;Batch Loading">bulk loading</a> techniques instead.
-      </p></div><div class="section" title="1.7.6.&nbsp;HBase Client: Group Puts by RegionServer"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.regiongroup"></a>1.7.6.&nbsp;HBase Client: Group Puts by RegionServer</h3></div></div></div><p>In addition to using the writeBuffer, grouping <code class="classname">Put</code>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
+          is a concern to use <a class="link" href="#perf.batch.loading" title="1.8.1.&nbsp;Batch Loading">bulk loading</a> techniques instead.
+      </p></div><div class="section" title="1.8.6.&nbsp;HBase Client: Group Puts by RegionServer"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.regiongroup"></a>1.8.6.&nbsp;HBase Client: Group Puts by RegionServer</h3></div></div></div><p>In addition to using the writeBuffer, grouping <code class="classname">Put</code>s by RegionServer can reduce the number of client RPC calls per writeBuffer flush.
       There is a utility <code class="classname">HTableUtil</code> currently on TRUNK that does this, but you can either copy that or implement your own verison for
       those still on 0.90.x or earlier.
-      </p></div><div class="section" title="1.7.7.&nbsp;MapReduce: Skip The Reducer"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.write.mr.reducer"></a>1.7.7.&nbsp;MapReduce:  Skip The Reducer</h3></div></div></div><p>When writing a lot of data to an HBase table from a MR job (e.g., with <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html" target="_top">TableOutputFormat</a>), and specifically where Puts are being emitted
+      </p></div><div class="section" title="1.8.7.&nbsp;MapReduce: Skip The Reducer"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.write.mr.reducer"></a>1.8.7.&nbsp;MapReduce:  Skip The Reducer</h3></div></div></div><p>When writing a lot of data to an HBase table from a MR job (e.g., with <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html" target="_top">TableOutputFormat</a>), and specifically where Puts are being emitted
       from the Mapper, skip the Reducer step.  When a Reducer step is used, all of the output (Puts) from the Mapper will get spooled to disk, then sorted/shuffled to other
       Reducers that will most likely be off-node.  It's far more efficient to just write directly to HBase.
       </p><p>For summary jobs where HBase is used as a source and a sink, then writes will be coming from the Reducer step (e.g., summarize values then write out result).
       This is a different processing problem than from the the above case.
-      </p></div><div class="section" title="1.7.8.&nbsp;Anti-Pattern: One Hot Region"><div class="titlepage"><div><div><h3 class="title"><a name="perf.one.region"></a>1.7.8.&nbsp;Anti-Pattern:  One Hot Region</h3></div></div></div><p>If all your data is being written to one region at a time, then re-read the
+      </p></div><div class="section" title="1.8.8.&nbsp;Anti-Pattern: One Hot Region"><div class="titlepage"><div><div><h3 class="title"><a name="perf.one.region"></a>1.8.8.&nbsp;Anti-Pattern:  One Hot Region</h3></div></div></div><p>If all your data is being written to one region at a time, then re-read the
     section on processing <a class="link" href="#">timeseries</a> data.</p><p>Also, if you are pre-splitting regions and all your data is <span class="emphasis"><em>still</em></span> winding up in a single region even though
     your keys aren't monotonically increasing, confirm that your keyspace actually works with the split strategy.  There are a
     variety of reasons that regions may appear "well split" but won't work with your data.   As
     the HBase client communicates directly with the RegionServers, this can be obtained via
     <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#getRegionLocation%28byte[]%29" target="_top">HTable.getRegionLocation</a>.
-    </p><p>See <a class="xref" href="#precreate.regions" title="1.7.2.&nbsp; Table Creation: Pre-Creating Regions">Section&nbsp;1.7.2, &#8220;
+    </p><p>See <a class="xref" href="#precreate.regions" title="1.8.2.&nbsp; Table Creation: Pre-Creating Regions">Section&nbsp;1.8.2, &#8220;
     Table Creation: Pre-Creating Regions
-    &#8221;</a>, as well as <a class="xref" href="#perf.configurations" title="1.4.&nbsp;HBase Configurations">Section&nbsp;1.4, &#8220;HBase Configurations&#8221;</a> </p></div></div><div class="section" title="1.8.&nbsp;Reading from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.reading"></a>1.8.&nbsp;Reading from HBase</h2></div></div></div><div class="section" title="1.8.1.&nbsp;Scan Caching"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.caching"></a>1.8.1.&nbsp;Scan Caching</h3></div></div></div><p>If HBase is used as an input source for a MapReduce job, for
+    &#8221;</a>, as well as <a class="xref" href="#perf.configurations" title="1.4.&nbsp;HBase Configurations">Section&nbsp;1.4, &#8220;HBase Configurations&#8221;</a> </p></div></div><div class="section" title="1.9.&nbsp;Reading from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.reading"></a>1.9.&nbsp;Reading from HBase</h2></div></div></div><div class="section" title="1.9.1.&nbsp;Scan Caching"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.caching"></a>1.9.1.&nbsp;Scan Caching</h3></div></div></div><p>If HBase is used as an input source for a MapReduce job, for
       example, make sure that the input <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_top">Scan</a>
       instance to the MapReduce job has <code class="methodname">setCaching</code> set to something greater
       than the default (which is 1). Using the default value means that the
@@ -167,22 +184,22 @@ Deferred log flush can be configured on 
       processed. Setting this value to 500, for example, will transfer 500
       rows at a time to the client to be processed. There is a cost/benefit to
       have the cache value be large because it costs more in memory for both
-      client and RegionServer, so bigger isn't always better.</p><div class="section" title="1.8.1.1.&nbsp;Scan Caching in MapReduce Jobs"><div class="titlepage"><div><div><h4 class="title"><a name="perf.hbase.client.caching.mr"></a>1.8.1.1.&nbsp;Scan Caching in MapReduce Jobs</h4></div></div></div><p>Scan settings in MapReduce jobs deserve special attention.  Timeouts can result (e.g., UnknownScannerException)
+      client and RegionServer, so bigger isn't always better.</p><div class="section" title="1.9.1.1.&nbsp;Scan Caching in MapReduce Jobs"><div class="titlepage"><div><div><h4 class="title"><a name="perf.hbase.client.caching.mr"></a>1.9.1.1.&nbsp;Scan Caching in MapReduce Jobs</h4></div></div></div><p>Scan settings in MapReduce jobs deserve special attention.  Timeouts can result (e.g., UnknownScannerException)
         in Map tasks if it takes longer to process a batch of records before the client goes back to the RegionServer for the
         next set of data.  This problem can occur because there is non-trivial processing occuring per row.  If you process
         rows quickly, set caching higher.  If you process rows more slowly (e.g., lots of transformations per row, writes),
         then set caching lower.
         </p><p>Timeouts can also happen in a non-MapReduce use case (i.e., single threaded HBase client doing a Scan), but the
         processing that is often performed in MapReduce jobs tends to exacerbate this issue.
-        </p></div></div><div class="section" title="1.8.2.&nbsp;Scan Attribute Selection"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.selection"></a>1.8.2.&nbsp;Scan Attribute Selection</h3></div></div></div><p>Whenever a Scan is used to process large numbers of rows (and especially when used
+        </p></div></div><div class="section" title="1.9.2.&nbsp;Scan Attribute Selection"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.selection"></a>1.9.2.&nbsp;Scan Attribute Selection</h3></div></div></div><p>Whenever a Scan is used to process large numbers of rows (and especially when used
       as a MapReduce source), be aware of which attributes are selected.   If <code class="code">scan.addFamily</code> is called
       then <span class="emphasis"><em>all</em></span> of the attributes in the specified ColumnFamily will be returned to the client.
       If only a small number of the available attributes are to be processed, then only those attributes should be specified
       in the input scan because attribute over-selection is a non-trivial performance penalty over large datasets.
-      </p></div><div class="section" title="1.8.3.&nbsp;MapReduce - Input Splits"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.mr.input"></a>1.8.3.&nbsp;MapReduce - Input Splits</h3></div></div></div><p>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to
+      </p></div><div class="section" title="1.9.3.&nbsp;MapReduce - Input Splits"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.mr.input"></a>1.9.3.&nbsp;MapReduce - Input Splits</h3></div></div></div><p>For MapReduce jobs that use HBase tables as a source, if there a pattern where the "slow" map tasks seem to
         have the same Input Split (i.e., the RegionServer serving the data), see the
         Troubleshooting Case Study in <a class="xref" href="#">???</a>.
-        </p></div><div class="section" title="1.8.4.&nbsp;Close ResultScanners"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.scannerclose"></a>1.8.4.&nbsp;Close ResultScanners</h3></div></div></div><p>This isn't so much about improving performance but rather
+        </p></div><div class="section" title="1.9.4.&nbsp;Close ResultScanners"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.scannerclose"></a>1.9.4.&nbsp;Close ResultScanners</h3></div></div></div><p>This isn't so much about improving performance but rather
       <span class="emphasis"><em>avoiding</em></span> performance problems. If you forget to
       close <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/ResultScanner.html" target="_top">ResultScanners</a>
       you can cause problems on the RegionServers. Always have ResultScanner
@@ -196,65 +213,65 @@ try {
 } finally {
   rs.close();  // always close the ResultScanner!
 }
-htable.close();</pre></div><div class="section" title="1.8.5.&nbsp;Block Cache"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.blockcache"></a>1.8.5.&nbsp;Block Cache</h3></div></div></div><p><a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_top">Scan</a>
+htable.close();</pre></div><div class="section" title="1.9.5.&nbsp;Block Cache"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.blockcache"></a>1.9.5.&nbsp;Block Cache</h3></div></div></div><p><a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_top">Scan</a>
       instances can be set to use the block cache in the RegionServer via the
       <code class="methodname">setCacheBlocks</code> method. For input Scans to MapReduce jobs, this should be
       <code class="varname">false</code>. For frequently accessed rows, it is advisable to use the block
-      cache.</p></div><div class="section" title="1.8.6.&nbsp;Optimal Loading of Row Keys"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.rowkeyonly"></a>1.8.6.&nbsp;Optimal Loading of Row Keys</h3></div></div></div><p>When performing a table <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_top">scan</a>
+      cache.</p></div><div class="section" title="1.9.6.&nbsp;Optimal Loading of Row Keys"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.client.rowkeyonly"></a>1.9.6.&nbsp;Optimal Loading of Row Keys</h3></div></div></div><p>When performing a table <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_top">scan</a>
             where only the row keys are needed (no families, qualifiers, values or timestamps), add a FilterList with a
             <code class="varname">MUST_PASS_ALL</code> operator to the scanner using <code class="methodname">setFilter</code>. The filter list
             should include both a <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/FirstKeyOnlyFilter.html" target="_top">FirstKeyOnlyFilter</a>
             and a <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/KeyOnlyFilter.html" target="_top">KeyOnlyFilter</a>.
             Using this filter combination will result in a worst case scenario of a RegionServer reading a single value from disk
             and minimal network traffic to the client for a single row.
-      </p></div><div class="section" title="1.8.7.&nbsp;Concurrency: Monitor Data Spread"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.read.dist"></a>1.8.7.&nbsp;Concurrency:  Monitor Data Spread</h3></div></div></div><p>When performing a high number of concurrent reads, monitor the data spread of the target tables.  If the target table(s) have
-      too few regions then the reads could likely be served from too few nodes.  </p><p>See <a class="xref" href="#precreate.regions" title="1.7.2.&nbsp; Table Creation: Pre-Creating Regions">Section&nbsp;1.7.2, &#8220;
+      </p></div><div class="section" title="1.9.7.&nbsp;Concurrency: Monitor Data Spread"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hbase.read.dist"></a>1.9.7.&nbsp;Concurrency:  Monitor Data Spread</h3></div></div></div><p>When performing a high number of concurrent reads, monitor the data spread of the target tables.  If the target table(s) have
+      too few regions then the reads could likely be served from too few nodes.  </p><p>See <a class="xref" href="#precreate.regions" title="1.8.2.&nbsp; Table Creation: Pre-Creating Regions">Section&nbsp;1.8.2, &#8220;
     Table Creation: Pre-Creating Regions
-    &#8221;</a>, as well as <a class="xref" href="#perf.configurations" title="1.4.&nbsp;HBase Configurations">Section&nbsp;1.4, &#8220;HBase Configurations&#8221;</a> </p></div><div class="section" title="1.8.8.&nbsp;Bloom Filters"><div class="titlepage"><div><div><h3 class="title"><a name="blooms"></a>1.8.8.&nbsp;Bloom Filters</h3></div></div></div><p>Enabling Bloom Filters can save your having to go to disk and
+    &#8221;</a>, as well as <a class="xref" href="#perf.configurations" title="1.4.&nbsp;HBase Configurations">Section&nbsp;1.4, &#8220;HBase Configurations&#8221;</a> </p></div><div class="section" title="1.9.8.&nbsp;Bloom Filters"><div class="titlepage"><div><div><h3 class="title"><a name="blooms"></a>1.9.8.&nbsp;Bloom Filters</h3></div></div></div><p>Enabling Bloom Filters can save your having to go to disk and
          can help improve read latencys.</p><p><a class="link" href="http://en.wikipedia.org/wiki/Bloom_filter" target="_top">Bloom filters</a> were developed over in <a class="link" href="https://issues.apache.org/jira/browse/HBASE-1200" target="_top">HBase-1200
-    Add bloomfilters</a>.<sup>[<a name="d656e573" href="#ftn.d656e573" class="footnote">2</a>]</sup><sup>[<a name="d656e585" href="#ftn.d656e585" class="footnote">3</a>]</sup></p><p>See also <a class="xref" href="#schema.bloom" title="1.6.4.&nbsp;Bloom Filters">Section&nbsp;1.6.4, &#8220;Bloom Filters&#8221;</a>.
-        </p><div class="section" title="1.8.8.1.&nbsp;Bloom StoreFile footprint"><div class="titlepage"><div><div><h4 class="title"><a name="bloom_footprint"></a>1.8.8.1.&nbsp;Bloom StoreFile footprint</h4></div></div></div><p>Bloom filters add an entry to the <code class="classname">StoreFile</code>
+    Add bloomfilters</a>.<sup>[<a name="d656e587" href="#ftn.d656e587" class="footnote">2</a>]</sup><sup>[<a name="d656e599" href="#ftn.d656e599" class="footnote">3</a>]</sup></p><p>See also <a class="xref" href="#schema.bloom" title="1.6.4.&nbsp;Bloom Filters">Section&nbsp;1.6.4, &#8220;Bloom Filters&#8221;</a>.
+        </p><div class="section" title="1.9.8.1.&nbsp;Bloom StoreFile footprint"><div class="titlepage"><div><div><h4 class="title"><a name="bloom_footprint"></a>1.9.8.1.&nbsp;Bloom StoreFile footprint</h4></div></div></div><p>Bloom filters add an entry to the <code class="classname">StoreFile</code>
       general <code class="classname">FileInfo</code> data structure and then two
       extra entries to the <code class="classname">StoreFile</code> metadata
-      section.</p><div class="section" title="1.8.8.1.1.&nbsp;BloomFilter in the StoreFile FileInfo data structure"><div class="titlepage"><div><div><h5 class="title"><a name="d656e609"></a>1.8.8.1.1.&nbsp;BloomFilter in the <code class="classname">StoreFile</code>
+      section.</p><div class="section" title="1.9.8.1.1.&nbsp;BloomFilter in the StoreFile FileInfo data structure"><div class="titlepage"><div><div><h5 class="title"><a name="d656e623"></a>1.9.8.1.1.&nbsp;BloomFilter in the <code class="classname">StoreFile</code>
         <code class="classname">FileInfo</code> data structure</h5></div></div></div><p><code class="classname">FileInfo</code> has a
           <code class="varname">BLOOM_FILTER_TYPE</code> entry which is set to
           <code class="varname">NONE</code>, <code class="varname">ROW</code> or
-          <code class="varname">ROWCOL.</code></p></div><div class="section" title="1.8.8.1.2.&nbsp;BloomFilter entries in StoreFile metadata"><div class="titlepage"><div><div><h5 class="title"><a name="d656e633"></a>1.8.8.1.2.&nbsp;BloomFilter entries in <code class="classname">StoreFile</code>
+          <code class="varname">ROWCOL.</code></p></div><div class="section" title="1.9.8.1.2.&nbsp;BloomFilter entries in StoreFile metadata"><div class="titlepage"><div><div><h5 class="title"><a name="d656e647"></a>1.9.8.1.2.&nbsp;BloomFilter entries in <code class="classname">StoreFile</code>
         metadata</h5></div></div></div><p><code class="varname">BLOOM_FILTER_META</code> holds Bloom Size, Hash
           Function used, etc. Its small in size and is cached on
           <code class="classname">StoreFile.Reader</code> load</p><p><code class="varname">BLOOM_FILTER_DATA</code> is the actual bloomfilter
           data. Obtained on-demand. Stored in the LRU cache, if it is enabled
-          (Its enabled by default).</p></div></div><div class="section" title="1.8.8.2.&nbsp;Bloom Filter Configuration"><div class="titlepage"><div><div><h4 class="title"><a name="config.bloom"></a>1.8.8.2.&nbsp;Bloom Filter Configuration</h4></div></div></div><div class="section" title="1.8.8.2.1.&nbsp;io.hfile.bloom.enabled global kill switch"><div class="titlepage"><div><div><h5 class="title"><a name="d656e653"></a>1.8.8.2.1.&nbsp;<code class="varname">io.hfile.bloom.enabled</code> global kill
+          (Its enabled by default).</p></div></div><div class="section" title="1.9.8.2.&nbsp;Bloom Filter Configuration"><div class="titlepage"><div><div><h4 class="title"><a name="config.bloom"></a>1.9.8.2.&nbsp;Bloom Filter Configuration</h4></div></div></div><div class="section" title="1.9.8.2.1.&nbsp;io.hfile.bloom.enabled global kill switch"><div class="titlepage"><div><div><h5 class="title"><a name="d656e667"></a>1.9.8.2.1.&nbsp;<code class="varname">io.hfile.bloom.enabled</code> global kill
         switch</h5></div></div></div><p><code class="code">io.hfile.bloom.enabled</code> in
         <code class="classname">Configuration</code> serves as the kill switch in case
-        something goes wrong. Default = <code class="varname">true</code>.</p></div><div class="section" title="1.8.8.2.2.&nbsp;io.hfile.bloom.error.rate"><div class="titlepage"><div><div><h5 class="title"><a name="d656e668"></a>1.8.8.2.2.&nbsp;<code class="varname">io.hfile.bloom.error.rate</code></h5></div></div></div><p><code class="varname">io.hfile.bloom.error.rate</code> = average false
+        something goes wrong. Default = <code class="varname">true</code>.</p></div><div class="section" title="1.9.8.2.2.&nbsp;io.hfile.bloom.error.rate"><div class="titlepage"><div><div><h5 class="title"><a name="d656e682"></a>1.9.8.2.2.&nbsp;<code class="varname">io.hfile.bloom.error.rate</code></h5></div></div></div><p><code class="varname">io.hfile.bloom.error.rate</code> = average false
         positive rate. Default = 1%. Decrease rate by &frac12; (e.g. to .5%) == +1
-        bit per bloom entry.</p></div><div class="section" title="1.8.8.2.3.&nbsp;io.hfile.bloom.max.fold"><div class="titlepage"><div><div><h5 class="title"><a name="d656e676"></a>1.8.8.2.3.&nbsp;<code class="varname">io.hfile.bloom.max.fold</code></h5></div></div></div><p><code class="varname">io.hfile.bloom.max.fold</code> = guaranteed minimum
+        bit per bloom entry.</p></div><div class="section" title="1.9.8.2.3.&nbsp;io.hfile.bloom.max.fold"><div class="titlepage"><div><div><h5 class="title"><a name="d656e690"></a>1.9.8.2.3.&nbsp;<code class="varname">io.hfile.bloom.max.fold</code></h5></div></div></div><p><code class="varname">io.hfile.bloom.max.fold</code> = guaranteed minimum
         fold rate. Most people should leave this alone. Default = 7, or can
         collapse to at least 1/128th of original size. See the
         <span class="emphasis"><em>Development Process</em></span> section of the document <a class="link" href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf" target="_top">BloomFilters
-        in HBase</a> for more on what this option means.</p></div></div></div></div><div class="section" title="1.9.&nbsp;Deleting from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.deleting"></a>1.9.&nbsp;Deleting from HBase</h2></div></div></div><div class="section" title="1.9.1.&nbsp;Using HBase Tables as Queues"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.queue"></a>1.9.1.&nbsp;Using HBase Tables as Queues</h3></div></div></div><p>HBase tables are sometimes used as queues.  In this case, special care must be taken to regularly perform major compactions on tables used in
+        in HBase</a> for more on what this option means.</p></div></div></div></div><div class="section" title="1.10.&nbsp;Deleting from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.deleting"></a>1.10.&nbsp;Deleting from HBase</h2></div></div></div><div class="section" title="1.10.1.&nbsp;Using HBase Tables as Queues"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.queue"></a>1.10.1.&nbsp;Using HBase Tables as Queues</h3></div></div></div><p>HBase tables are sometimes used as queues.  In this case, special care must be taken to regularly perform major compactions on tables used in
        this manner.  As is documented in <a class="xref" href="#">???</a>, marking rows as deleted creates additional StoreFiles which then need to be processed
        on reads.  Tombstones only get cleaned up with major compactions.
        </p><p>See also <a class="xref" href="#">???</a> and <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29" target="_top">HBaseAdmin.majorCompact</a>.
-       </p></div><div class="section" title="1.9.2.&nbsp;Delete RPC Behavior"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.rpc"></a>1.9.2.&nbsp;Delete RPC Behavior</h3></div></div></div><p>Be aware that <code class="code">htable.delete(Delete)</code> doesn't use the writeBuffer.  It will execute an RegionServer RPC with each invocation.
+       </p></div><div class="section" title="1.10.2.&nbsp;Delete RPC Behavior"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.rpc"></a>1.10.2.&nbsp;Delete RPC Behavior</h3></div></div></div><p>Be aware that <code class="code">htable.delete(Delete)</code> doesn't use the writeBuffer.  It will execute an RegionServer RPC with each invocation.
        For a large number of deletes, consider <code class="code">htable.delete(List)</code>.
        </p><p>See <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29" target="_top">http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29</a>
-       </p></div></div><div class="section" title="1.10.&nbsp;HDFS"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.hdfs"></a>1.10.&nbsp;HDFS</h2></div></div></div><p>Because HBase runs on <a class="xref" href="#">???</a> it is important to understand how it works and how it affects
+       </p></div></div><div class="section" title="1.11.&nbsp;HDFS"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.hdfs"></a>1.11.&nbsp;HDFS</h2></div></div></div><p>Because HBase runs on <a class="xref" href="#">???</a> it is important to understand how it works and how it affects
    HBase.
-   </p><div class="section" title="1.10.1.&nbsp;Current Issues With Low-Latency Reads"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.curr"></a>1.10.1.&nbsp;Current Issues With Low-Latency Reads</h3></div></div></div><p>The original use-case for HDFS was batch processing.  As such, there low-latency reads were historically not a priority.
+   </p><div class="section" title="1.11.1.&nbsp;Current Issues With Low-Latency Reads"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.curr"></a>1.11.1.&nbsp;Current Issues With Low-Latency Reads</h3></div></div></div><p>The original use-case for HDFS was batch processing.  As such, there low-latency reads were historically not a priority.
       With the increased adoption of Apache HBase this is changing, and several improvements are already in development.
       See the
       <a class="link" href="https://issues.apache.org/jira/browse/HDFS-1599" target="_top">Umbrella Jira Ticket for HDFS Improvements for HBase</a>.
-      </p></div><div class="section" title="1.10.2.&nbsp;Leveraging local data"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.configs.localread"></a>1.10.2.&nbsp;Leveraging local data</h3></div></div></div><p>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
+      </p></div><div class="section" title="1.11.2.&nbsp;Leveraging local data"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.configs.localread"></a>1.11.2.&nbsp;Leveraging local data</h3></div></div></div><p>Since Hadoop 1.0.0 (also 0.22.1, 0.23.1, CDH3u3 and HDP 1.0) via
 <a class="link" href="https://issues.apache.org/jira/browse/HDFS-2246" target="_top">HDFS-2246</a>,
 it is possible for the DFSClient to take a "short circuit" and
 read directly from disk instead of going through the DataNode when the
 data is local. What this means for HBase is that the RegionServers can
 read directly off their machine's disks instead of having to open a
 socket to talk to the DataNode, the former being generally much
-faster<sup>[<a name="d656e748" href="#ftn.d656e748" class="footnote">4</a>]</sup>.
+faster<sup>[<a name="d656e762" href="#ftn.d656e762" class="footnote">4</a>]</sup>.
 Also see <a class="link" href="http://search-hadoop.com/m/zV6dKrLCVh1" target="_top">HBase, mail # dev - read short circuit</a> thread for
 more discussion around short circuit reads.
 </p><p>To enable "short circuit" reads, you must set two configurations.
@@ -273,33 +290,33 @@ configuration. Be aware that if a proces
 username than the one configured here also has the shortcircuit
 enabled, it will get an Exception regarding an unauthorized access but
 the data will still be read.
-</p></div><div class="section" title="1.10.3.&nbsp;Performance Comparisons of HBase vs. HDFS"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.comp"></a>1.10.3.&nbsp;Performance Comparisons of HBase vs. HDFS</h3></div></div></div><p>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
+</p></div><div class="section" title="1.11.3.&nbsp;Performance Comparisons of HBase vs. HDFS"><div class="titlepage"><div><div><h3 class="title"><a name="perf.hdfs.comp"></a>1.11.3.&nbsp;Performance Comparisons of HBase vs. HDFS</h3></div></div></div><p>A fairly common question on the dist-list is why HBase isn't as performant as HDFS files in a batch context (e.g., as
      a MapReduce source or sink).  The short answer is that HBase is doing a lot more than HDFS (e.g., reading the KeyValues,
      returning the most current row or specified timestamps, etc.), and as such HBase is 4-5 times slower than HDFS in this
      processing context.  Not that there isn't room for improvement (and this gap will, over time, be reduced), but HDFS
       will always be faster in this use-case.
-     </p></div></div><div class="section" title="1.11.&nbsp;Amazon EC2"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.ec2"></a>1.11.&nbsp;Amazon EC2</h2></div></div></div><p>Performance questions are common on Amazon EC2 environments because it is a shared environment.  You will
+     </p></div></div><div class="section" title="1.12.&nbsp;Amazon EC2"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.ec2"></a>1.12.&nbsp;Amazon EC2</h2></div></div></div><p>Performance questions are common on Amazon EC2 environments because it is a shared environment.  You will
    not see the same throughput as a dedicated server.  In terms of running tests on EC2, run them several times for the same
    reason (i.e., it's a shared environment and you don't know what else is happening on the server).
    </p><p>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
     because EC2 issues are practically a separate class of performance issues.
-   </p></div><div class="section" title="1.12.&nbsp;Case Studies"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.casestudy"></a>1.12.&nbsp;Case Studies</h2></div></div></div><p>For Performance and Troubleshooting Case Studies, see <a class="xref" href="#">???</a>.
+   </p></div><div class="section" title="1.13.&nbsp;Case Studies"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.casestudy"></a>1.13.&nbsp;Case Studies</h2></div></div></div><p>For Performance and Troubleshooting Case Studies, see <a class="xref" href="#">???</a>.
       </p></div><div class="footnotes"><br><hr width="100" align="left"><div class="footnote"><p><sup>[<a id="ftn.d656e106" href="#d656e106" class="para">1</a>] </sup>The latest jvms do better
         regards fragmentation so make sure you are running a recent release.
         Read down in the message,
-        <a class="link" href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html" target="_top">Identifying concurrent mode failures caused by fragmentation</a>.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e573" href="#d656e573" class="para">2</a>] </sup>For description of the development process -- why static blooms
+        <a class="link" href="http://osdir.com/ml/hotspot-gc-use/2011-11/msg00002.html" target="_top">Identifying concurrent mode failures caused by fragmentation</a>.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e587" href="#d656e587" class="para">2</a>] </sup>For description of the development process -- why static blooms
         rather than dynamic -- and for an overview of the unique properties
         that pertain to blooms in HBase, as well as possible future
         directions, see the <span class="emphasis"><em>Development Process</em></span> section
         of the document <a class="link" href="https://issues.apache.org/jira/secure/attachment/12444007/Bloom_Filters_in_HBase.pdf" target="_top">BloomFilters
-        in HBase</a> attached to <a class="link" href="https://issues.apache.org/jira/browse/HBASE-1200" target="_top">HBase-1200</a>.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e585" href="#d656e585" class="para">3</a>] </sup>The bloom filters described here are actually version two of
+        in HBase</a> attached to <a class="link" href="https://issues.apache.org/jira/browse/HBASE-1200" target="_top">HBase-1200</a>.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e599" href="#d656e599" class="para">3</a>] </sup>The bloom filters described here are actually version two of
         blooms in HBase. In versions up to 0.19.x, HBase had a dynamic bloom
         option based on work done by the <a class="link" href="http://www.one-lab.org" target="_top">European Commission One-Lab
         Project 034819</a>. The core of the HBase bloom work was later
         pulled up into Hadoop to implement org.apache.hadoop.io.BloomMapFile.
         Version 1 of HBase blooms never worked that well. Version 2 is a
         rewrite from scratch though again it starts with the one-lab
-        work.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e748" href="#d656e748" class="para">4</a>] </sup>See JD's <a class="link" href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf" target="_top">Performance Talk</a></p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
+        work.</p></div><div class="footnote"><p><sup>[<a id="ftn.d656e762" href="#d656e762" class="para">4</a>] </sup>See JD's <a class="link" href="http://files.meetup.com/1350427/hug_ebay_jdcryans.pdf" target="_top">Performance Talk</a></p></div></div></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';
     var disqus_identifier = 'performance';

Modified: hbase/hbase.apache.org/trunk/performance/perf.casestudy.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/performance/perf.casestudy.html?rev=1462679&r1=1462678&r2=1462679&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/performance/perf.casestudy.html (original)
+++ hbase/hbase.apache.org/trunk/performance/perf.casestudy.html Sat Mar 30 00:19:55 2013
@@ -1,6 +1,6 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>1.12.&nbsp;Case Studies</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.ec2.html" title="1.11.&nbsp;Amazon EC2"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.12.&nbsp;Case Studies</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.ec2.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;</td></tr></table><hr></div><div class="section" title="1.12.&nbsp;Case Studies"><div class="titlepage"><div><div><h2 class="title"
  style="clear: both"><a name="perf.casestudy"></a>1.12.&nbsp;Case Studies</h2></div></div></div><p>For Performance and Troubleshooting Case Studies, see <a class="xref" href="">???</a>.
+   <title>1.13.&nbsp;Case Studies</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.ec2.html" title="1.12.&nbsp;Amazon EC2"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.13.&nbsp;Case Studies</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.ec2.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;</td></tr></table><hr></div><div class="section" title="1.13.&nbsp;Case Studies"><div class="titlepage"><div><div><h2 class="title"
  style="clear: both"><a name="perf.casestudy"></a>1.13.&nbsp;Case Studies</h2></div></div></div><p>For Performance and Troubleshooting Case Studies, see <a class="xref" href="">???</a>.
       </p></div><div id="disqus_thread"></div><script type="text/javascript">
     var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
     var disqus_url = 'http://hbase.apache.org/book';
@@ -12,4 +12,4 @@
         dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
         (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
     })();
-</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.ec2.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;</td></tr><tr><td width="40%" align="left" valign="top">1.11.&nbsp;Amazon EC2&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;</td></tr></table></div></body></html>
\ No newline at end of file
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.ec2.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;</td></tr><tr><td width="40%" align="left" valign="top">1.12.&nbsp;Amazon EC2&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;</td></tr></table></div></body></html>
\ No newline at end of file

Modified: hbase/hbase.apache.org/trunk/performance/perf.deleting.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/performance/perf.deleting.html?rev=1462679&r1=1462678&r2=1462679&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/performance/perf.deleting.html (original)
+++ hbase/hbase.apache.org/trunk/performance/perf.deleting.html Sat Mar 30 00:19:55 2013
@@ -1,10 +1,10 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>1.9.&nbsp;Deleting from HBase</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.reading.html" title="1.8.&nbsp;Reading from HBase"><link rel="next" href="perf.hdfs.html" title="1.10.&nbsp;HDFS"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.9.&nbsp;Deleting from HBase</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.reading.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="perf.hdfs.html">Ne
 xt</a></td></tr></table><hr></div><div class="section" title="1.9.&nbsp;Deleting from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.deleting"></a>1.9.&nbsp;Deleting from HBase</h2></div></div></div><div class="section" title="1.9.1.&nbsp;Using HBase Tables as Queues"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.queue"></a>1.9.1.&nbsp;Using HBase Tables as Queues</h3></div></div></div><p>HBase tables are sometimes used as queues.  In this case, special care must be taken to regularly perform major compactions on tables used in
+   <title>1.10.&nbsp;Deleting from HBase</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.reading.html" title="1.9.&nbsp;Reading from HBase"><link rel="next" href="perf.hdfs.html" title="1.11.&nbsp;HDFS"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.10.&nbsp;Deleting from HBase</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.reading.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="perf.hdfs.html">
 Next</a></td></tr></table><hr></div><div class="section" title="1.10.&nbsp;Deleting from HBase"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.deleting"></a>1.10.&nbsp;Deleting from HBase</h2></div></div></div><div class="section" title="1.10.1.&nbsp;Using HBase Tables as Queues"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.queue"></a>1.10.1.&nbsp;Using HBase Tables as Queues</h3></div></div></div><p>HBase tables are sometimes used as queues.  In this case, special care must be taken to regularly perform major compactions on tables used in
        this manner.  As is documented in <a class="xref" href="">???</a>, marking rows as deleted creates additional StoreFiles which then need to be processed
        on reads.  Tombstones only get cleaned up with major compactions.
        </p><p>See also <a class="xref" href="">???</a> and <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HBaseAdmin.html#majorCompact%28java.lang.String%29" target="_top">HBaseAdmin.majorCompact</a>.
-       </p></div><div class="section" title="1.9.2.&nbsp;Delete RPC Behavior"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.rpc"></a>1.9.2.&nbsp;Delete RPC Behavior</h3></div></div></div><p>Be aware that <code class="code">htable.delete(Delete)</code> doesn't use the writeBuffer.  It will execute an RegionServer RPC with each invocation.
+       </p></div><div class="section" title="1.10.2.&nbsp;Delete RPC Behavior"><div class="titlepage"><div><div><h3 class="title"><a name="perf.deleting.rpc"></a>1.10.2.&nbsp;Delete RPC Behavior</h3></div></div></div><p>Be aware that <code class="code">htable.delete(Delete)</code> doesn't use the writeBuffer.  It will execute an RegionServer RPC with each invocation.
        For a large number of deletes, consider <code class="code">htable.delete(List)</code>.
        </p><p>See <a class="link" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29" target="_top">http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.html#delete%28org.apache.hadoop.hbase.client.Delete%29</a>
        </p></div></div><div id="disqus_thread"></div><script type="text/javascript">
@@ -18,4 +18,4 @@
         dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
         (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
     })();
-</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.reading.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="perf.hdfs.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.8.&nbsp;Reading from HBase&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.10.&nbsp;HDFS</td></tr></table></div></body></html>
\ No newline at end of file
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.reading.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="perf.hdfs.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.9.&nbsp;Reading from HBase&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.11.&nbsp;HDFS</td></tr></table></div></body></html>
\ No newline at end of file

Modified: hbase/hbase.apache.org/trunk/performance/perf.ec2.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/performance/perf.ec2.html?rev=1462679&r1=1462678&r2=1462679&view=diff
==============================================================================
--- hbase/hbase.apache.org/trunk/performance/perf.ec2.html (original)
+++ hbase/hbase.apache.org/trunk/performance/perf.ec2.html Sat Mar 30 00:19:55 2013
@@ -1,6 +1,6 @@
 <html><head>
       <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
-   <title>1.11.&nbsp;Amazon EC2</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.hdfs.html" title="1.10.&nbsp;HDFS"><link rel="next" href="perf.casestudy.html" title="1.12.&nbsp;Case Studies"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.11.&nbsp;Amazon EC2</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.hdfs.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="perf.casestudy.html">Next</a></td></tr><
 /table><hr></div><div class="section" title="1.11.&nbsp;Amazon EC2"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.ec2"></a>1.11.&nbsp;Amazon EC2</h2></div></div></div><p>Performance questions are common on Amazon EC2 environments because it is a shared environment.  You will
+   <title>1.12.&nbsp;Amazon EC2</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.hdfs.html" title="1.11.&nbsp;HDFS"><link rel="next" href="perf.casestudy.html" title="1.13.&nbsp;Case Studies"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.12.&nbsp;Amazon EC2</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.hdfs.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="perf.casestudy.html">Next</a></td></tr><
 /table><hr></div><div class="section" title="1.12.&nbsp;Amazon EC2"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.ec2"></a>1.12.&nbsp;Amazon EC2</h2></div></div></div><p>Performance questions are common on Amazon EC2 environments because it is a shared environment.  You will
    not see the same throughput as a dedicated server.  In terms of running tests on EC2, run them several times for the same
    reason (i.e., it's a shared environment and you don't know what else is happening on the server).
    </p><p>If you are running on EC2 and post performance questions on the dist-list, please state this fact up-front that
@@ -16,4 +16,4 @@
         dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
         (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
     })();
-</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.hdfs.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="perf.casestudy.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.10.&nbsp;HDFS&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.12.&nbsp;Case Studies</td></tr></table></div></body></html>
\ No newline at end of file
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.hdfs.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="perf.casestudy.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.11.&nbsp;HDFS&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.13.&nbsp;Case Studies</td></tr></table></div></body></html>
\ No newline at end of file

Added: hbase/hbase.apache.org/trunk/performance/perf.general.html
URL: http://svn.apache.org/viewvc/hbase/hbase.apache.org/trunk/performance/perf.general.html?rev=1462679&view=auto
==============================================================================
--- hbase/hbase.apache.org/trunk/performance/perf.general.html (added)
+++ hbase/hbase.apache.org/trunk/performance/perf.general.html Sat Mar 30 00:19:55 2013
@@ -0,0 +1,31 @@
+<html><head>
+      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
+   <title>1.7.&nbsp;HBase General Patterns</title><link rel="stylesheet" type="text/css" href="../css/freebsd_docbook.css"><meta name="generator" content="DocBook XSL-NS Stylesheets V1.76.1"><link rel="home" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="up" href="performance.html" title="Chapter&nbsp;1.&nbsp;Apache HBase (TM) Performance Tuning"><link rel="prev" href="perf.schema.html" title="1.6.&nbsp;Schema Design"><link rel="next" href="perf.writing.html" title="1.8.&nbsp;Writing to HBase"></head><body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"><div class="navheader"><table width="100%" summary="Navigation header"><tr><th colspan="3" align="center">1.7.&nbsp;HBase General Patterns</th></tr><tr><td width="20%" align="left"><a accesskey="p" href="perf.schema.html">Prev</a>&nbsp;</td><th width="60%" align="center">&nbsp;</th><td width="20%" align="right">&nbsp;<a accesskey="n" href="perf.
 writing.html">Next</a></td></tr></table><hr></div><div class="section" title="1.7.&nbsp;HBase General Patterns"><div class="titlepage"><div><div><h2 class="title" style="clear: both"><a name="perf.general"></a>1.7.&nbsp;HBase General Patterns</h2></div></div></div><div class="section" title="1.7.1.&nbsp;Constants"><div class="titlepage"><div><div><h3 class="title"><a name="perf.general.constants"></a>1.7.1.&nbsp;Constants</h3></div></div></div><p>When people get started with HBase they have a tendency to write code that looks like this:
+</p><pre class="programlisting">
+Get get = new Get(rowkey);
+Result r = htable.get(get);
+byte[] b = r.getValue(Bytes.toBytes("cf"), Bytes.toBytes("attr"));  // returns current version of value
+</pre><p>
+		But especially when inside loops (and MapReduce jobs), converting the columnFamily and column-names
+		to byte-arrays repeatedly is surprisingly expensive.
+		It's better to use constants for the byte-arrays, like this:
+</p><pre class="programlisting">
+public static final byte[] CF = "cf".getBytes();
+public static final byte[] ATTR = "attr".getBytes();
+...
+Get get = new Get(rowkey);
+Result r = htable.get(get);
+byte[] b = r.getValue(CF, ATTR);  // returns current version of value
+</pre><p>
+      </p></div></div><div id="disqus_thread"></div><script type="text/javascript">
+    var disqus_shortname = 'hbase'; // required: replace example with your forum shortname
+    var disqus_url = 'http://hbase.apache.org/book';
+    var disqus_identifier = 'perf.general';
+
+    /* * * DON'T EDIT BELOW THIS LINE * * */
+    (function() {
+        var dsq = document.createElement('script'); dsq.type = 'text/javascript'; dsq.async = true;
+        dsq.src = 'http://' + disqus_shortname + '.disqus.com/embed.js';
+        (document.getElementsByTagName('head')[0] || document.getElementsByTagName('body')[0]).appendChild(dsq);
+    })();
+</script><noscript>Please enable JavaScript to view the <a href="http://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript><a href="http://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a><div class="navfooter"><hr><table width="100%" summary="Navigation footer"><tr><td width="40%" align="left"><a accesskey="p" href="perf.schema.html">Prev</a>&nbsp;</td><td width="20%" align="center">&nbsp;</td><td width="40%" align="right">&nbsp;<a accesskey="n" href="perf.writing.html">Next</a></td></tr><tr><td width="40%" align="left" valign="top">1.6.&nbsp;Schema Design&nbsp;</td><td width="20%" align="center"><a accesskey="h" href="performance.html">Home</a></td><td width="40%" align="right" valign="top">&nbsp;1.8.&nbsp;Writing to HBase</td></tr></table></div></body></html>
\ No newline at end of file



Mime
View raw message