hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: RS memory leak?
Date Sat, 17 Dec 2011 23:16:59 GMT
On Sat, Dec 17, 2011 at 1:29 PM, Homer Strong <homer.strong@gmail.com> wrote:
> @Stack, we tried your suggestion for getting off the ground with an
> extra RS. We added 1 more identical RS, and after balancing, killed
> the extra one. The cluster remained stable for the night, but this
> morning all 3 of our RSs had OOMs.
>

Sounds like you need more than 3 regionservers for your current load.
Run with 4 or 5 for a while and use the time to work on merging your
regions down to a smaller number -- run with many fewer per
regionserver (make your regions bigger) -- and figure why you are
getting the OOME.

What do you see occupying memory in the regionserver?   You have 700
or so regions per server?  You have a block cache of what size?  And
the indexes for storefiles are taking up how much heap (Do you have
wide keys?)  Are the cells large?

You disabled swap but is your memory overcommitted: i.e. if you add up
all used by all processes on the box is it greater than physical
memory in size?



> In the logs we find many entries like
>
> https://gist.github.com/eadb953fcadbeb302143
>
> Followed by the RSs aborting due to OOMs. Could this maybe be subject
> to HBASE-4222?
>

Whats happening on the datanodes?  E.g. 10.192.21.220:50010?  Look in
its logs?  Why is regionserver failing to sync?  See if you can figure
it.

St.Ack

> Thanks for your help!
>
>
> On Fri, Dec 16, 2011 at 3:31 PM, Homer Strong <homer.strong@gmail.com> wrote:
>> Thanks for the response! To add to our problem's description: it
>> doesn't seem like an absolute number of regions that triggers the
>> memory overuse, we've seen it happen now with a wide range of region
>> counts.
>>
>>> Just opening regions, it does this?
>> Yes.
>>
>>> No load?
>> Very low load, no requests.
>>
>>> No swapping?
>> Swapping is disabled.
>>
>>
>>> Bring up more xlarge instances and see if gets you off the ground?
>>> Then work on getting your number of regions down in number?
>> We'll try this and get back in a couple minutes!
>>
>>
>>
>> On Fri, Dec 16, 2011 at 3:21 PM, Stack <stack@duboce.net> wrote:
>>> On Fri, Dec 16, 2011 at 1:57 PM, Homer Strong <homer.strong@gmail.com>
wrote:
>>>> Whenever a RS is assigned a large (> 500-600) number of regions, the
>>>> heap usage grows without bound. Then the RS constantly GCs and must be
>>>> killed.
>>>>
>>>
>>> Just opening regions, it does this?
>>>
>>> No load?
>>>
>>> No swapping?
>>>
>>> What JVM and what args for JVM?
>>>
>>>
>>>> This is with 2000 regions over 3 RSs, with 10 GB heap. RSs have EC2
>>>> xlarges. Master is on its own large. Datanodes and namenodes are
>>>> adjacent to RSs and master, respectively.
>>>>
>>>> Looks like a memory leak? Any suggestions would be appreciated.
>>>>
>>>
>>> Bring up more xlarge instances and see if gets you off the ground?
>>> Then work on getting your number of regions down in number?
>>>
>>> St.Ack

Mime
View raw message