hbase-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dm...@apache.org
Subject svn commit: r1206362 - in /hbase/trunk/src/docbkx: book.xml configuration.xml performance.xml troubleshooting.xml
Date Fri, 25 Nov 2011 22:20:49 GMT
Author: dmeil
Date: Fri Nov 25 22:20:48 2011
New Revision: 1206362

URL: http://svn.apache.org/viewvc?rev=1206362&view=rev
Log:
HBASE-4871 hbase book. docs cleanup.

Modified:
    hbase/trunk/src/docbkx/book.xml
    hbase/trunk/src/docbkx/configuration.xml
    hbase/trunk/src/docbkx/performance.xml
    hbase/trunk/src/docbkx/troubleshooting.xml

Modified: hbase/trunk/src/docbkx/book.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/book.xml?rev=1206362&r1=1206361&r2=1206362&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/book.xml (original)
+++ hbase/trunk/src/docbkx/book.xml Fri Nov 25 22:20:48 2011
@@ -1556,41 +1556,30 @@ scan.setFilter(filter);
 
     <section xml:id="regions.arch">
     <title>Regions</title>
-    <para>This section is all about Regions.</para>
-    <note>
-        <para>Regions are comprised of a Store per Column Family.
-        </para>
-    </note>
+    <para>Regions are the basic element of availability and
+     distribution for tables, and are comprised of a Store per Column Family.
+    </para>
 
     <section xml:id="arch.regions.size">
       <title>Region Size</title>
 
-      <para>Region size is one of those tricky things, there are a few factors
+      <para>Determining the "right" region size can be tricky, and there are a few
factors
       to consider:</para>
 
       <itemizedlist>
         <listitem>
-          <para>Regions are the basic element of availability and
-          distribution.</para>
-        </listitem>
-
-        <listitem>
           <para>HBase scales by having regions across many servers. Thus if
-          you have 2 regions for 16GB data, on a 20 node machine you are a net
-          loss there.</para>
+          you have 2 regions for 16GB data, on a 20 node machine your data
+          will be concentrated on just a few machines - nearly the entire
+          cluster will be idle.  This really cant be stressed enough, since a 
+          common problem is loading 200MB data into HBase then wondering why 
+          your awesome 10 node cluster isn't doing anything.</para>
         </listitem>
 
         <listitem>
-          <para>High region count has been known to make things slow, this is
-          getting better, but it is probably better to have 700 regions than
-          3000 for the same amount of data.</para>
-        </listitem>
-
-        <listitem>
-          <para>Low region count prevents parallel scalability as per point
-          #2. This really cant be stressed enough, since a common problem is
-          loading 200MB data into HBase then wondering why your awesome 10
-          node cluster is mostly idle.</para>
+          <para>On the other hand, high region count has been known to make things
slow. 
+          This is getting better with each release of HBase, but it is probably better to
have
+          700 regions than 3000 for the same amount of data.</para>
         </listitem>
 
         <listitem>
@@ -1599,10 +1588,12 @@ scan.setFilter(filter);
         </listitem>
       </itemizedlist>
 
-      <para>Its probably best to stick to the default, perhaps going smaller
-      for hot tables (or manually split hot regions to spread the load over
-      the cluster), or go with a 1GB region size if your cell sizes tend to be
+      <para>When starting off, its probably best to stick to the default region-size,
perhaps going
+      smaller for hot tables (or manually split hot regions to spread the load over
+      the cluster), or go with larger region sizes if your cell sizes tend to be
       largish (100k and up).</para>
+      <para>See <xref linkend="bigger.regions"/> for more information on configuration.
+      </para>
     </section>
 
       <section>

Modified: hbase/trunk/src/docbkx/configuration.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/configuration.xml?rev=1206362&r1=1206361&r2=1206362&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/configuration.xml (original)
+++ hbase/trunk/src/docbkx/configuration.xml Fri Nov 25 22:20:48 2011
@@ -1028,6 +1028,11 @@ index e70ebc6..96f8c27 100644
           throughput is affected since every request that hits that region server will take
longer,
           which exacerbates the problem even more.
           </para>
+          <para>You can get a sense of whether you have too little or too many handlers
by
+            <xref linkend="rpc.logging" />
+            on an individual RegionServer then tailing its logs (Queued requests
+            consume memory).
+            </para>
           </section>
       <section xml:id="big_memory">
         <title>Configuration for large memory machines</title>
@@ -1054,11 +1059,20 @@ index e70ebc6..96f8c27 100644
       Consider going to larger regions to cut down on the total number of regions
       on your cluster. Generally less Regions to manage makes for a smoother running
       cluster (You can always later manually split the big Regions should one prove
-      hot and you want to spread the request load over the cluster).  By default,
-      regions are 256MB in size.  You could run with
-      1G.  Some run with even larger regions; 4G or even larger.  Adjust
-      <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
+      hot and you want to spread the request load over the cluster).  A lower number of regions
is
+       preferred, generally in the range of 20 to low-hundreds
+       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
+       </para>
+       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb, with
a default of 256Mb.
+       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported
(e.g., 20Gb).
+       </para>
+       <para>You may need to experiment with this setting based on your hardware configuration
and application needs.
+       </para>
+       <para>Adjust <code>hbase.hregion.max.filesize</code> in your <filename>hbase-site.xml</filename>.
+       RegionSize can also be set on a per-table basis via 
+       <link xlink:href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/HTableDescriptor.html">HTableDescriptor</link>.
       </para>
+      
       </section>
       <section xml:id="disable.splitting">
       <title>Managed Splitting</title>

Modified: hbase/trunk/src/docbkx/performance.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/performance.xml?rev=1206362&r1=1206361&r2=1206362&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/performance.xml (original)
+++ hbase/trunk/src/docbkx/performance.xml Fri Nov 25 22:20:48 2011
@@ -140,14 +140,6 @@
       <para>The number of regions for an HBase table is driven by the <xref
               linkend="bigger.regions" />. Also, see the architecture
           section on <xref linkend="arch.regions.size" /></para>
-       <para>A lower number of regions is preferred, generally in the range of 20 to
low-hundreds
-       per RegionServer.  Adjust the regionsize as appropriate to achieve this number. 
-       </para>
-       <para>For the 0.90.x codebase, the upper-bound of regionsize is about 4Gb.
-       For 0.92.x codebase, due to the HFile v2 change much larger regionsizes can be supported
(e.g., 20Gb).
-       </para>
-       <para>You may need to experiment with this setting based on your hardware configuration
and application needs.
-       </para>
     </section>
 
     <section xml:id="perf.compactions.and.splits">
@@ -161,15 +153,7 @@
     <section xml:id="perf.handlers">
         <title><varname>hbase.regionserver.handler.count</varname></title>
         <para>See <xref linkend="hbase.regionserver.handler.count"/>. 
-            This setting in essence sets how many requests are
-            concurrently being processed inside the RegionServer at any
-            one time.  If set too high, then throughput may suffer as
-            the concurrent requests contend; if set too low, requests will
-            be stuck waiting to get into the machine.  You can get a
-            sense of whether you have too little or too many handlers by
-            <xref linkend="rpc.logging" />
-            on an individual RegionServer then tailing its logs (Queued requests
-            consume memory).</para>
+	    </para>
     </section>
     <section xml:id="perf.hfile.block.cache.size">
         <title><varname>hfile.block.cache.size</varname></title>

Modified: hbase/trunk/src/docbkx/troubleshooting.xml
URL: http://svn.apache.org/viewvc/hbase/trunk/src/docbkx/troubleshooting.xml?rev=1206362&r1=1206361&r2=1206362&view=diff
==============================================================================
--- hbase/trunk/src/docbkx/troubleshooting.xml (original)
+++ hbase/trunk/src/docbkx/troubleshooting.xml Fri Nov 25 22:20:48 2011
@@ -574,6 +574,18 @@ hadoop   17789  155 35.2 9067824 8604364
        </section>    
      </section>
         
+    <section xml:id="trouble.network">
+      <title>Network</title>
+      <section xml:id="trouble.network.spikes">
+        <title>Network Spikes</title>
+        <para>If you are seeing periodic network spikes you might want to check the
compactionQueues to see if major 
+        compactions are happening.
+        </para>
+        <para>See <xref linkend="managed.compactions"/> for more information
on managing compactions.
+        </para>
+        </section>
+    </section>
+        
     <section xml:id="trouble.rs">
       <title>RegionServer</title>
         <para>For more information on the RegionServers, see <xref linkend="regionserver.arch"/>.




Mime
View raw message