hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject HBase 0.90.3 OOM at 1.5G heap
Date Mon, 11 Jul 2011 08:04:03 GMT
Hi,

I am running HBASE 0.90.3 (just upgraded for testing). It is configured 
for 1.5G heap, which seemed to be a good setting for HBASE 0.20.6. When 
running a stress test that would write into three HBASE data nodes from 
24 processes with the goal of inserting one billion simple rows, I get 
an OOMs at two of three region servers after about 75% of the work is done.

Here is the first OOM:

2011-07-09 23:34:40,988 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegion: Applied 924, skipped 1105, 
firstSequenceidInLog=162957072, maxSequenceidInLog=163841413
2011-07-09 23:34:40,988 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush for 
tir_items,customer/7/8CC6E17710156EE5518325B96E5F5EB9FF3278D2F2E8848E859E90CC7445AE8E,1309973529621.39f9da510435c2bc053fab116af0d4d6.,

current region memstore size 270.7k; wal is null, using passed 
sequenceid=163841413
2011-07-09 23:34:40,989 DEBUG 
org.apache.hadoop.hbase.regionserver.HRegion: Finished snapshotting, 
commencing flushing stores
2011-07-09 23:34:43,266 DEBUG 
org.apache.hadoop.hbase.regionserver.Store: loaded 
hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2395129059875563550,

isReference=false, isBulkLoadResult=false, seqid=150362469, 
majorCompaction=false
2011-07-09 23:34:51,788 DEBUG 
org.apache.hadoop.hbase.regionserver.Store: loaded 
hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2547547152617947847,

isReference=false, isBulkLoadResult=false, seqid=163671317, 
majorCompaction=false
2011-07-09 23:34:58,652 DEBUG 
org.apache.hadoop.hbase.regionserver.Store: loaded 
hdfs://tirmaster:9000/hbase/tir_items/0fb951f11fe3caef6c5ad5595ffda9ea/original1/2867700810527601701,

isReference=false, isBulkLoadResult=false, seqid=150617582, 
majorCompaction=false
2011-07-09 23:35:35,067 ERROR 
org.apache.hadoop.hbase.executor.EventHandler: Caught throwable while 
processing event M_RS_OPEN_REGION
java.lang.OutOfMemoryError: Java heap space
         at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.readAllIndex(HFile.java:805)
         at 
org.apache.hadoop.hbase.io.hfile.HFile$Reader.loadFileInfo(HFile.java:832)
         at 
org.apache.hadoop.hbase.regionserver.StoreFile$Reader.loadFileInfo(StoreFile.java:1002)
         at 
org.apache.hadoop.hbase.regionserver.StoreFile.open(StoreFile.java:382)
         at 
org.apache.hadoop.hbase.regionserver.StoreFile.createReader(StoreFile.java:438)
         at 
org.apache.hadoop.hbase.regionserver.Store.loadStoreFiles(Store.java:266)
         at 
org.apache.hadoop.hbase.regionserver.Store.<init>(Store.java:208)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:2008)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:346)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2551)
         at 
org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2537)
         at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:272)
         at 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:99)
         at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:156)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
         at java.lang.Thread.run(Thread.java:662)


It then gets more until something fatal happen.

Now:

1. Is there any way to configure some stable heap size? Where is the 
leak? This is really frustrating (it took a while to figure out 1.5G was 
"somehow good" for 0.20.6)

2. Wouldn't it make sense to let the region server die at the first OOM 
and have it restarted quickly rather then letting it go on in some 
likely broken state after the OOM until it eventually dies anyway?

But, on the good side,  0.90.3 is notably faster at writing than 0.20.6.

Thanks,

*Henning Blohm*

*ZFabrik Software KG*

T: 	+49/62278399955
F: 	+49/62278399956
M: 	+49/1781891820

Bunsenstrasse 1
69190 Walldorf

henning.blohm@zfabrik.de <mailto:henning.blohm@zfabrik.de>
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
www.zfabrik.de <http://www.zfabrik.de>
www.z2-environment.eu <http://www.z2-environment.eu>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message