hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Tarnas <...@email.com>
Subject Re: how to make tuning for hbase (every couple of days hbase region sever/s crashe)
Date Wed, 24 Aug 2011 16:06:25 GMT


We had a similar OOME problem and and we solved it by allocating more heap space. The underlying
cause for us was as the table grew, the StoreFileIndex grew taking up a larger and larger
chunk of heap.

What caused this to be a problem is that Memstore grows rapidly during inserts and its size
limits are not StoreFileIndex aware. After doing some heavy inserts the Memstore + StoreFileIndex
is more than heap. If you restart the regionserver then the Memstore is flushed and you are
well under heap and all appears well. Something similar could happen with the BlockCache to
but we didn't directly see that.

We fixed this by allocating more heap and reducing the StoreFileIndex size by increasing the
hfile block size and using shorter keys and column/column family names. 

-chris


On Aug 24, 2011, at 12:35 AM, Oleg Ruchovets wrote:

> Thanks for your feedback.
> 
> The point is that once we restart hbase memory footprint is far below 4 GB.
> The system runs well for couple of days and then the heap reaches 4GB which
> causes the region to crash.
> 
> This may indicate on memory leak since once we restart hbase the problem is
> solved (or maybe its just a configuration problem??).
> 
> I'm afraid that giving more memory to the region (8GB) will only postpone
> the problem, meaning the region will still crash but less frequently.
> 
> How do you think we should tackle this problem?
> 
> Best,
> Oleg
> 
> 
> 
> 
> On Wed, Aug 24, 2011 at 6:52 AM, Michael Segel <michael_segel@hotmail.com>wrote:
> 
>> 
>> I won't say you're crazy but .5 GB per mapper?
>> 
>> I would say tune conservatively like you are suggesting 1GB for OS, but
>> also I'd suggest tuning to 80% utilization instead of 100% utilization.
>> 
>>> From: buttler1@llnl.gov
>>> To: user@hbase.apache.org
>>> Date: Tue, 23 Aug 2011 16:35:22 -0700
>>> Subject: RE: how to make tuning for hbase (every couple of days hbase
>> region sever/s crashe)
>>> 
>>> So, if you use 0.5 GB / mapper and 1 GB / reducer, your total memory
>> consumption (minus hbase) on a slave node should be:
>>> 4 GB M/R tasks
>>> 1 GB OS -- just a guess
>>> 1 GB datanode
>>> 1 GB tasktracker
>>> Leaving you with up to 9 GB for your region servers.  I would suggest
>> bumping your region server ram up to 8GB, and leave a GB for OS caching. [I
>> am sure someone out there will tell me I am crazy]
>>> 
>>> 
>>> However, it is the log that is the most useful part of your email.
>> Unfortunately I haven't seen that error before.
>>> Are you using the Multi methods a lot in your code?
>>> 
>>> Dave
>>> 
>>> -----Original Message-----
>>> From: Oleg Ruchovets [mailto:oruchovets@gmail.com]
>>> Sent: Tuesday, August 23, 2011 1:38 PM
>>> To: user@hbase.apache.org
>>> Subject: Re: how to make tuning for hbase (every couple of days hbase
>> region sever/s crashe)
>>> 
>>> Thank you for detailed response,
>>> 
>>> On Tue, Aug 23, 2011 at 7:49 PM, Buttler, David <buttler1@llnl.gov>
>> wrote:
>>> 
>>>> Have you looked at the logs of the region servers?  That is a good
>> first
>>>> place to look.
>>> 
>>> How many regions are in your system?
>>> 
>>> 
>>>         Region Servers
>>> 
>>> Address Start Code Load
>>> hadoop01 1314007529600 requests=0, regions=212, usedHeap=3171,
>> maxHeap=3983
>>> hadoop02 1314007496109 requests=0, regions=207, usedHeap=2185,
>> maxHeap=3983
>>> hadoop03 1314008874001 requests=0, regions=208, usedHeap=1955,
>> maxHeap=3983
>>> hadoop04 1314008965432 requests=0, regions=209, usedHeap=2034,
>> maxHeap=3983
>>> hadoop05 1314007496533 requests=0, regions=208, usedHeap=1970,
>> maxHeap=3983
>>> hadoop06 1314008874036 requests=0, regions=208, usedHeap=1987,
>> maxHeap=3983
>>> hadoop07 1314007496927 requests=0, regions=209, usedHeap=2118,
>> maxHeap=3983
>>> hadoop08 1314007497034 requests=0, regions=211, usedHeap=2568,
>> maxHeap=3983
>>> hadoop09 1314007497221 requests=0, regions=209, usedHeap=2148,
>> maxHeap=3983
>>> master            1314008873765 requests=0, regions=208, usedHeap=2007,
>>> maxHeap=3962
>>> Total: servers: 10  requests=0, regions=2089
>>> 
>>> most of the  time GC succeeded to clean up but every 3/4 days used memory
>>> become close to 4G
>>> 
>>> and there are alot of Exceptions like this:
>>> 
>>>   org.apache.hadoop.ipc.*HBase*Server: IPC Server
>>> Responder, call
>> multi(org.apache.hadoop.*hbase*.client.MultiAction@491fb2f4)
>>> from 10.11.87.73:33737: output error
>>> 2011-08-14 18:37:36,264 WARN org.apache.hadoop.ipc.*HBase*Server: IPC
>> Server
>>> handler 24 on 8041 caught: java.nio.channels.ClosedChannelException
>>>        at
>>> sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:133)
>>>        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:324)
>>>        at
>>> 
>> org.apache.hadoop.*hbase.ipc.HBaseServer.channelIO(HBase*Server.java:1387)
>>>        at
>>> org.apache.hadoop.*hbase.ipc.HBaseServer.channelWrite(HBase*
>>> Server.java:1339)
>>>        at
>>> org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.processResponse(HBase*
>>> Server.java:727)
>>>        at
>>> org.apache.hadoop.*hbase.ipc.HBaseServer$Responder.doRespond(HBase*
>>> Server.java:792)
>>>        at
>>> 
>> org.apache.hadoop.*hbase.ipc.HBaseServer$Handler.run(HBase*Server.java:1083)
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>> If you are using MSLAB, it reserves 2MB/region as a buffer -- that can
>> add
>>>> up when you have lots of regions.
>>>> 
>>>> 
>>> 
>>> 
>>> 
>>>> Given so little information all my guesses are going to be wild, but
>> they
>>>> might help:
>>>> 4GB may not be enough for your current load.
>>> 
>>> Have you considered changing your memory allocation, giving less to your
>>>> map/reduce jobs and more to HBase?
>>>> 
>>>> 
>>>  Interesting point , can you advice relation between m/r memory
>> allocation
>>> related to hbase region?
>>> 
>>>  currently we have 512m for map (4 map per machine) and 1024m for
>> reduce(2
>>> reducers per machine)
>>> 
>>> 
>>>> What is your key distribution like?
>>> 
>>> Are you writing to all regions equally, or are you hotspotting on one
>>>> region?
>>>> 
>>> 
>>> every day before running job we manually allocates regions
>>> with lexicographically start and end key to get good distribution and
>>> prevent hot-spots.
>>> 
>>> 
>>>> 
>>>> Check your cell/row sizes.  Are they really large (e.g. cells > 1 MB;
>> rows
>>>>> 100 MB)?  Increasing region size should help here, but there may be
>> an
>>>> issue with your RAM allocation for HBase.
>>>> 
>>>> 
>>> I'll check but I almost sure that we have no row > 100MB, we changed
>> region
>>> size for 500Mb to prevent automatic splits (after successfully inserted
>> job
>>> we have ~ 200-250 mb files per region)
>>> and for the next day we allocate a new one.
>>> 
>>> 
>>>> Are you sure that you are not overloading the machine memory? How much
>> RAM
>>>> do you allocate for map reduce jobs?
>>>> 
>>>> 
>>>    512M -- map
>>>    1024 -- reduce
>>> 
>>> 
>>>> How do you distribute your processes over machines?  Does your master
>> run
>>>> namenode, hmaster, jobtracker, and zookeeper, while your slaves run
>>>> datanode, tasktracker, and hregionserver?
>>> 
>>> 
>>> Exactly , we have such process distribution.
>>> we have 16G ordinary machines
>>> and 48G ram for maser , so I am not sure that I  understand your
>> calculation
>>> , please clarify
>>> 
>>> If so, then your memory allocation is:
>>>> 4 GB for regionserver
>>>> 1 GB for OS
>>>> 1 GB for datanode
>>>> 1 GB for tasktracker
>>>> 9/6 GB for M/R
>>>> So, are you sure that all of your m/r tasks take less than 1 GB?
>>>> 
>>>> Dave
>>>> 
>>>> -----Original Message-----
>>>> From: Oleg Ruchovets [mailto:oruchovets@gmail.com]
>>>> Sent: Tuesday, August 23, 2011 2:15 AM
>>>> To: user@hbase.apache.org
>>>> Subject: how to make tuning for hbase (every couple of days hbase
>> region
>>>> sever/s crashe)
>>>> 
>>>> Hi ,
>>>> 
>>>> Our environment
>>>> hbase 90.2 (10 machine)
>>>>   We have 10 machine grid:
>>>>   master has 48G ram
>>>>   slaves machine has 16G ram.
>>>>   Region Server process has 4G ram
>>>>   Zookeeper process has 2G ram
>>>>    We have 4map/2reducer per machine
>>>> 
>>>> 
>>>> We write from m/r job to hbase (2 jobs a day).  3 months system works
>>>> without any problem , but now  every 3/4 days region server crashes.
>>>>  What we done so far:
>>>>  1) We running major compaction manually once a day
>>>>  2) We increases regions size to prevent automatic split.
>>>> 
>>>> Question:
>>>>  What is the way to make a HBase tuning ?
>>>>  How to debug such problem , because it is still not clear for me what
>> is
>>>> the root  cause of region's crashes?
>>>> 
>>>> 
>>>> 
>>>>  We started from this post.
>>>> 
>>>> 
>> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
>>>> 
>>>> 
>>>> <
>>>> 
>> http://search-hadoop.com/m/HDoK22ikTCI/M%252FR+vs+hbase+problem+in+production&subj=M+R+vs+hbase+problem+in+production
>>>>> 
>>>> Regards
>>>> Oleg.
>>>> 
>> 
>> 


Mime
View raw message