hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henning Blohm <henning.bl...@zfabrik.de>
Subject Re: HBase 0.90.3 OOM at 1.5G heap
Date Tue, 12 Jul 2011 08:01:24 GMT
Good morning St.Ack,

the schema consists of one table and one column family, holding five 
columns with one string (<20 chars) and four double numbers (rather 
minimal really).

The load test runs in 24 concurrent mappers, each writing 500k rows, 
2000 runs in total.

WAL is turned on.

And yes, it took down to region servers and the processes were 
eventually gone. From the logs however it looked as if the region 
servers still tried to continue for a while after the first OOM.

They didn't get restarted and I had the impression the HMaster didn't 
respond to web requests either (but I shut it down quickly to restart 
the whole cluster - so not sure about that).

My hbase-env.sh is out-of-the-box except for the heap settings. So the 
GC config is

  -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC 
-XX:+CMSIncrementalMode

is that too little aggressive?

hbase-site.xml is also standard, except for the cluster config (i.e. the 
zookeeper quorom config etc).


Just noticed that there is a gc log. I will look into that as well.

Currently retrying with 2G heap.

Thanks,
   Henning

On 07/11/2011 06:24 PM, Stack wrote:
> On Mon, Jul 11, 2011 at 1:04 AM, Henning Blohm<henning.blohm@zfabrik.de>  wrote:
>> I am running HBASE 0.90.3 (just upgraded for testing). It is configured for
>> 1.5G heap, which seemed to be a good setting for HBASE 0.20.6. When running
>> a stress test that would write into three HBASE data nodes from 24 processes
>> with the goal of inserting one billion simple rows, I get an OOMs at two of
>> three region servers after about 75% of the work is done.
>>
> Whats your schema?  Whats the size of your cells?  0.90 is different
> to 0.20.  1.5G is little memory but HBase should just work w/ 1G or
> more of heap.
>
>> Here is the first OOM:
>>
>> 2011-07-09 23:34:40,988 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
>> Applied 924, skipped 1105, firstSequenceidInLog=162957072,
>> maxSequenceidInLog=163841413
> This looks like you are crashing regionservers.  Is that so?  Whats
> your current GC config?
>
>
>> Now:
>>
>> 1. Is there any way to configure some stable heap size? Where is the leak?
>> This is really frustrating (it took a while to figure out 1.5G was "somehow
>> good" for 0.20.6)
>>
> Start big.  Give it 8Gs?  See how it does then.
>
> How many handlers are you running with?
>
>
>> 2. Wouldn't it make sense to let the region server die at the first OOM and
>> have it restarted quickly rather then letting it go on in some likely broken
>> state after the OOM until it eventually dies anyway?
>>
> Don't we do this currently?  Only time this does not happen is when
> the OOME happens out at extremities in RPC which we do not directly
> control (We should fix that).  It catches OOME and then tries to keep
> going.  Otherwise, if OOME, we'll release resevoir of memory that
> we've been holding back so we can shut ourselves down.
>
> St.Ack


-- 

*Henning Blohm*

*ZFabrik Software KG*

T: 	+49/62278399955
F: 	+49/62278399956
M: 	+49/1781891820

Bunsenstrasse 1
69190 Walldorf

henning.blohm@zfabrik.de <mailto:henning.blohm@zfabrik.de>
Linkedin <http://de.linkedin.com/pub/henning-blohm/0/7b5/628>
www.zfabrik.de <http://www.zfabrik.de>
www.z2-environment.eu <http://www.z2-environment.eu>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message