hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Homer Strong <homer.str...@gmail.com>
Subject Re: RS memory leak?
Date Tue, 20 Dec 2011 01:53:54 GMT
After our weekend struggle, we ended up just dropping some tables that
we can rebuild with MR. Planning to merge smaller regions in the
immediate future. With fewer regions, the cluster started with no
issues.

Thanks for your suggestions!


On Sat, Dec 17, 2011 at 3:16 PM, Stack <stack@duboce.net> wrote:
> On Sat, Dec 17, 2011 at 1:29 PM, Homer Strong <homer.strong@gmail.com> wrote:
>> @Stack, we tried your suggestion for getting off the ground with an
>> extra RS. We added 1 more identical RS, and after balancing, killed
>> the extra one. The cluster remained stable for the night, but this
>> morning all 3 of our RSs had OOMs.
>>
>
> Sounds like you need more than 3 regionservers for your current load.
> Run with 4 or 5 for a while and use the time to work on merging your
> regions down to a smaller number -- run with many fewer per
> regionserver (make your regions bigger) -- and figure why you are
> getting the OOME.
>
> What do you see occupying memory in the regionserver?   You have 700
> or so regions per server?  You have a block cache of what size?  And
> the indexes for storefiles are taking up how much heap (Do you have
> wide keys?)  Are the cells large?
>
> You disabled swap but is your memory overcommitted: i.e. if you add up
> all used by all processes on the box is it greater than physical
> memory in size?
>
>
>
>> In the logs we find many entries like
>>
>> https://gist.github.com/eadb953fcadbeb302143
>>
>> Followed by the RSs aborting due to OOMs. Could this maybe be subject
>> to HBASE-4222?
>>
>
> Whats happening on the datanodes?  E.g. 10.192.21.220:50010?  Look in
> its logs?  Why is regionserver failing to sync?  See if you can figure
> it.
>
> St.Ack
>
>> Thanks for your help!
>>
>>
>> On Fri, Dec 16, 2011 at 3:31 PM, Homer Strong <homer.strong@gmail.com> wrote:
>>> Thanks for the response! To add to our problem's description: it
>>> doesn't seem like an absolute number of regions that triggers the
>>> memory overuse, we've seen it happen now with a wide range of region
>>> counts.
>>>
>>>> Just opening regions, it does this?
>>> Yes.
>>>
>>>> No load?
>>> Very low load, no requests.
>>>
>>>> No swapping?
>>> Swapping is disabled.
>>>
>>>
>>>> Bring up more xlarge instances and see if gets you off the ground?
>>>> Then work on getting your number of regions down in number?
>>> We'll try this and get back in a couple minutes!
>>>
>>>
>>>
>>> On Fri, Dec 16, 2011 at 3:21 PM, Stack <stack@duboce.net> wrote:
>>>> On Fri, Dec 16, 2011 at 1:57 PM, Homer Strong <homer.strong@gmail.com>
wrote:
>>>>> Whenever a RS is assigned a large (> 500-600) number of regions, the
>>>>> heap usage grows without bound. Then the RS constantly GCs and must be
>>>>> killed.
>>>>>
>>>>
>>>> Just opening regions, it does this?
>>>>
>>>> No load?
>>>>
>>>> No swapping?
>>>>
>>>> What JVM and what args for JVM?
>>>>
>>>>
>>>>> This is with 2000 regions over 3 RSs, with 10 GB heap. RSs have EC2
>>>>> xlarges. Master is on its own large. Datanodes and namenodes are
>>>>> adjacent to RSs and master, respectively.
>>>>>
>>>>> Looks like a memory leak? Any suggestions would be appreciated.
>>>>>
>>>>
>>>> Bring up more xlarge instances and see if gets you off the ground?
>>>> Then work on getting your number of regions down in number?
>>>>
>>>> St.Ack

Mime
View raw message