hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: How to know the root reason to cause RegionServer OOM?
Date Wed, 20 May 2015 15:12:58 GMT
On Wed, May 20, 2015 at 1:46 AM, David chen <c77_cn@163.com> wrote:

> Thanks Ted,
> For scenario #1, can not see any clues in regionserver log file that
> denotes "kill -9" command was executed. Meanwhile, i think when JVM
> inspects regionserver process OOME, it will create a new thread to execute
> "kill -9 %p", the new thread should not write regionserver log, so the
> fact, there is not any clues in regionserver log, is normal. Right?
> For scenario #2, dmesg also did not provide any clues. But some clues were
> seen in /var/log/messages:
> ......
> May 14 12:00:38 localhost kernel: Out of memory: Kill process 22827 (java)
> score 497 or sacrifice child
> May 14 12:00:38 localhost kernel: Killed process 22827, UID 483, (java)
> total-vm:17569220kB, anon-rss:16296276kB, file-rss:240kB
> ......
> The 22827 above is regionserver PID.
> It looks like regionserver itself OOM(total-vm:17569220kB,
> anon-rss:16296276kB, the max-heap-size set is 15G), so was killed. Right?
>

Yes.


> But hbase has no heavy load in the cluster,


Doesn't matter. You allocated it a heap of 15G. The OS is looking for
memory and is at a extreme (swapping totally disabled?) so it starts
killing random processes. This is not an hbase issue. It is an
oversubscription problem. Google how to address.


> so i don't think it was killed because of itself OOME, instead i think
> because of lack of memory for other applications, so OS kill regionserver
> to run more applications.
> I currently has no evidence to prove my idea, so hope more helps. Thanks.
>

You quote all necessary evidence above.

St.Ack


>
>
>
>
>
>
>
> At 2015-05-20 10:04:19, "Ted Yu" <yuzhihong@gmail.com> wrote:
> >For scenario #1, you would see in the regionserver.out file that "kill -9
> "
> >command was applied due to OOME.
> >
> >For scenario #2, can you see if dmesg provides some clue ?
> >
> >Cheers
> >
> >On Tue, May 19, 2015 at 6:32 PM, David chen <c77_cn@163.com> wrote:
> >
> >> Thanks for guys reply, its indeed helped me.
> >> Another question, I think there are two possibilities to kill
> RegionServer
> >> process:
> >> 1. When JVM inspects that the memory, RegionServer has occupied, exceed
> >> the max-heap-size,  then JVM calls positively the command configured by
> >> option "-XX:OnOutOfMemoryError=kill -9 %p" to kill RegionServer
> process.
> >> 2. RegionServer process does not reach the max-heap-size, but new
> >> application need to allocation memory,  if lack of memory, OS will
> choose
> >> to kill some processes, RegionServer unfortunately becomes the first
> >> choice, so it  is killed by OS.
> >> Is my understanding right? If so, how to know which possibility my scene
> >> is?
> >> Any ideas can be appreciated!
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message