hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "PerformanceTuning" by stack
Date Thu, 24 Feb 2011 19:05:40 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "PerformanceTuning" page has been changed by stack.
http://wiki.apache.org/hadoop/PerformanceTuning?action=diff&rev1=9&rev2=10

--------------------------------------------------

   * Ram, ram, ram.  Don't starve HBase.
   * More CPUs is important, as you will see in the next section
   * Use a 64-bit platform, and a 64-bit JVM.
-  * Your clients might need tuning: [[http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html]]
+  * Your clients might need tuning: [[http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html|Performance
of hbase importing]]
   * Make sure that the command {{{java}}} implies {{{-server}}} on your machines, or else
you will have to explicitly enable it.
   * Are you swapping?  JVMs hate swapping.  Consider removing swap space.
   * By default, each regionserver puts up 10 listeners only.  Up it if you have measurable
traffic (See hbase.regionserver.handler.count in hbase-default.xml).
   * To speed up the inserts in a non critical job (like an import job), you can use Put.writeToWAL(false)
to bypass writing to the write ahead log.
-  * New in HBase 0.21 (not released yet), you can set a table-scope attribute to defer the
write ahead log's flushes in order to improve write performance, this is now the default behavior.
Use HTableDescriptor.setDeferredLogFlush(false) when creating your table to instead make sure
every WAL edit is flushed to the Datanodes. The default setting of hbase.regionserver.optionallogflushinterval
is 1000 which means that in the worst case you lose 1 second of edits. Since the WAL is shared
on a region server for all regions, any other table not using this feature will flush your
other table's edits. 
+  * You can set a table-scope attribute to defer the write ahead log's flushes in order to
improve write performance, this is now the default behavior. Use HTableDescriptor.setDeferredLogFlush(false)
when creating your table to instead make sure every WAL edit is flushed to the Datanodes.
The default setting of hbase.regionserver.optionallogflushinterval is 1000 which means that
in the worst case you lose 1 second of edits. Since the WAL is shared on a region server for
all regions, any other table not using this feature will flush your other table's edits. 
  
  == HBase JVM and GC ==
  
  HBase is memory intensive, and using the default GC you can see long pauses in all threads.
 With the addition of ZooKeeper this can cause false errors as ZooKeeper and the HBase master
thinks a regionserver has died.  
  
- To avoid this, you must use Java6 CMS: [[http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html]]
+ To avoid this, you must use Java6 CMS: [[http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html|GC
Tuning]]
  
  To enable, in hbase-env.sh add: {{{
  export HBASE_OPTS="-XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-Xloggc:/home/hadoop/hbase/logs/gc-hbase.log"
@@ -39, +39 @@

  
  Adjust the log directory to wherever you log.
  
- the CMS collector will use many threads to do the concurrent sweeping of your heap, if you
are running on a 2 cpu system, you should probably also enable Incremental mode [[http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms]]:
{{{
+ the CMS collector will use many threads to do the concurrent sweeping of your heap, if you
are running on a 2 cpu system, you should probably also enable [[http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#icms|Incremental
CMS mode]]: {{{
  export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode <other options>"
  }}}
  
@@ -90, +90 @@

  
  The key points here is to keep all these pauses low.  CMS pauses are always low, but if
your ParNew starts growing, you can see minor GC pauses approach 100ms, exceed 100ms and hit
as high at 400ms.
  
- This can be due to the size of the ParNew, which should be relatively small.  If your ParNew
is very large after running HBase for a while, in one example a ParNew was about 150MB, then
you might have to constrain the size of ParNew.
+ This can be due to the size of the ParNew, which should be relatively small.  If your ParNew
is very large after running HBase for a while, in one example a ParNew was about 150MB, then
you might have to constrain the size of ParNew (The larger it is, the longer the collections
take but if its too small, objects are promoted to old gen too quickly).  In the below we
constrain new gen size to 64m.
  
  Add this to HBASE_OPTS: {{{
- export HBASE_OPTS="-XX:NewSize=6m -XX:MaxNewSize=6m <cms options from above> <gc
logging options from above>"
+ export HBASE_OPTS="-XX:NewSize=64m -XX:MaxNewSize=64m <cms options from above> <gc
logging options from above>"
  }}}
  
+ See also the [[http://hbase.apache.org/book.html|HBase book]] for more on GC tuning, in
particular for suggestions dealing with long-running stop-the-world GCs.
+ 

Mime
View raw message