hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David chen" <c77...@163.com>
Subject Re:Re: How to know the root reason to cause RegionServer OOM?
Date Mon, 18 May 2015 03:08:17 GMT
The snippet in /var/log/messages is as follows, i am sure that process killed(22827) is RegsionServer.
......
May 14 12:00:38 localhost kernel: Mem-Info:
May 14 12:00:38 localhost kernel: Node 0 DMA per-cpu:
May 14 12:00:38 localhost kernel: CPU    0: hi:    0, btch:   1 usd:   0
......
May 14 12:00:38 localhost kernel: CPU   39: hi:    0, btch:   1 usd:   0
May 14 12:00:38 localhost kernel: Node 0 DMA32 per-cpu:
May 14 12:00:38 localhost kernel: CPU    0: hi:  186, btch:  31 usd:  30
......
May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:   8
May 14 12:00:38 localhost kernel: Node 0 Normal per-cpu:
May 14 12:00:38 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   5
......
May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:  20
May 14 12:00:38 localhost kernel: Node 1 Normal per-cpu:
May 14 12:00:38 localhost kernel: CPU    0: hi:  186, btch:  31 usd:   7
......
May 14 12:00:38 localhost kernel: CPU   39: hi:  186, btch:  31 usd:  10
May 14 12:00:38 localhost kernel: active_anon:7993118 inactive_anon:48001 isolated_anon:0
May 14 12:00:38 localhost kernel: active_file:855 inactive_file:960 isolated_file:0
May 14 12:00:38 localhost kernel: unevictable:0 dirty:0 writeback:0 unstable:0
May 14 12:00:38 localhost kernel: free:39239 slab_reclaimable:14043 slab_unreclaimable:27993
May 14 12:00:38 localhost kernel: mapped:48750 shmem:75053 pagetables:20540 bounce:0
May 14 12:00:38 localhost kernel: Node 0 DMA free:15732kB min:40kB low:48kB high:60kB active_anon:0kB
inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB
present:15336kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB
pages_scanned:0 all_unreclaimable? yes
May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 3211 16088 16088
May 14 12:00:38 localhost kernel: Node 0 DMA32 free:60388kB min:8968kB low:11208kB high:13452kB
active_anon:2811676kB inactive_anon:72kB active_file:0kB inactive_file:788kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:3288224kB mlocked:0kB dirty:0kB writeback:44kB
mapped:156kB shmem:8232kB slab_reclaimable:10652kB slab_unreclaimable:5144kB kernel_stack:56kB
pagetables:4252kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1312 all_unreclaimable?
yes
May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 12877 12877
May 14 12:00:38 localhost kernel: Node 0 Normal free:35772kB min:35964kB low:44952kB high:53944kB
active_anon:13062472kB inactive_anon:4864kB active_file:1268kB inactive_file:1504kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:13186560kB mlocked:0kB dirty:0kB writeback:92kB
mapped:6172kB shmem:51928kB slab_reclaimable:22732kB slab_unreclaimable:73204kB kernel_stack:16240kB
pagetables:38040kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:10268 all_unreclaimable?
yes
May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
May 14 12:00:38 localhost kernel: Node 1 Normal free:45064kB min:45132kB low:56412kB high:67696kB
active_anon:16098324kB inactive_anon:187068kB active_file:2192kB inactive_file:1548kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB present:16547840kB mlocked:0kB dirty:116kB writeback:0kB
mapped:188672kB shmem:240052kB slab_reclaimable:22788kB slab_unreclaimable:33624kB kernel_stack:7352kB
pagetables:39868kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:12064 all_unreclaimable?
yes
May 14 12:00:38 localhost kernel: lowmem_reserve[]: 0 0 0 0
May 14 12:00:38 localhost kernel: Node 0 DMA: 1*4kB 0*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB
0*512kB 1*1024kB 1*2048kB 3*4096kB = 15732kB
May 14 12:00:38 localhost kernel: Node 0 DMA32: 659*4kB 576*8kB 485*16kB 338*32kB 208*64kB
106*128kB 27*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 60636kB
May 14 12:00:38 localhost kernel: Node 0 Normal: 1166*4kB 579*8kB 337*16kB 203*32kB 106*64kB
61*128kB 3*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 37568kB
May 14 12:00:38 localhost kernel: Node 1 Normal: 668*4kB 405*8kB 422*16kB 259*32kB 176*64kB
67*128kB 7*256kB 2*512kB 0*1024kB 0*2048kB 0*4096kB = 43608kB
May 14 12:00:38 localhost kernel: 78257 total pagecache pages
May 14 12:00:38 localhost kernel: 0 pages in swap cache
May 14 12:00:38 localhost kernel: Swap cache stats: add 0, delete 0, find 0/0
May 14 12:00:38 localhost kernel: Free swap  = 0kB
May 14 12:00:38 localhost kernel: Total swap = 0kB
May 14 12:00:38 localhost kernel: 8388607 pages RAM
May 14 12:00:38 localhost kernel: 181753 pages reserved
May 14 12:00:38 localhost kernel: 77957 pages shared
May 14 12:00:38 localhost kernel: 8104642 pages non-shared
May 14 12:00:38 localhost kernel: [ pid ]   uid  tgid total_vm      rss cpu oom_adj oom_score_adj
name
......
May 14 12:00:38 localhost kernel: [22827]   483 22827  4392305  4074129  23       0      
      0 java
May 14 12:00:38 localhost kernel: [38727]   483 38727   428355    74385  22       0      
      0 java
......
May 14 12:00:38 localhost kernel: Out of memory: Kill process 22827 (java) score 497 or sacrifice
child
May 14 12:00:38 localhost kernel: Killed process 22827, UID 483, (java) total-vm:17569220kB,
anon-rss:16296276kB, file-rss:240kB
May 14 12:00:38 localhost kernel: sleep invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0,
oom_score_adj=0
May 14 12:00:38 localhost kernel: sleep cpuset=/ mems_allowed=0-1
May 14 12:00:38 localhost kernel: Pid: 31136, comm: sleep Not tainted 2.6.32-358.el6.x86_64
#1
May 14 12:00:38 localhost kernel: Call Trace:
May 14 12:00:38 localhost kernel: [<ffffffff810cb5d1>] ? cpuset_print_task_mems_allowed+0x91/0xb0
May 14 12:00:38 localhost kernel: [<ffffffff8111cd10>] ? dump_header+0x90/0x1b0
May 14 12:00:38 localhost kernel: [<ffffffff810e91ee>] ? __delayacct_freepages_end+0x2e/0x30
May 14 12:00:38 localhost kernel: [<ffffffff8121d0bc>] ? security_real_capable_noaudit+0x3c/0x70
May 14 12:00:38 localhost kernel: [<ffffffff8111d192>] ? oom_kill_process+0x82/0x2a0
May 14 12:00:38 localhost kernel: [<ffffffff8111d0d1>] ? select_bad_process+0xe1/0x120
May 14 12:00:38 localhost kernel: [<ffffffff8111d5d0>] ? out_of_memory+0x220/0x3c0
May 14 12:00:38 localhost kernel: [<ffffffff8112c27c>] ? __alloc_pages_nodemask+0x8ac/0x8d0
May 14 12:00:38 localhost kernel: [<ffffffff8116087a>] ? alloc_pages_current+0xaa/0x110
May 14 12:00:38 localhost kernel: [<ffffffff8111a0f7>] ? __page_cache_alloc+0x87/0x90
May 14 12:00:38 localhost kernel: [<ffffffff81119ade>] ? find_get_page+0x1e/0xa0
May 14 12:00:38 localhost kernel: [<ffffffff8111b0b7>] ? filemap_fault+0x1a7/0x500
May 14 12:00:38 localhost kernel: [<ffffffff811430b4>] ? __do_fault+0x54/0x530
May 14 12:00:38 localhost kernel: [<ffffffff81059784>] ? find_busiest_group+0x244/0x9f0
May 14 12:00:38 localhost kernel: [<ffffffff81143687>] ? handle_pte_fault+0xf7/0xb50
May 14 12:00:38 localhost kernel: [<ffffffff8105e203>] ? perf_event_task_sched_out+0x33/0x80
May 14 12:00:38 localhost kernel: [<ffffffff8114431a>] ? handle_mm_fault+0x23a/0x310
May 14 12:00:38 localhost kernel: [<ffffffff810474c9>] ? __do_page_fault+0x139/0x480
May 14 12:00:38 localhost kernel: [<ffffffff8109be2f>] ? hrtimer_try_to_cancel+0x3f/0xd0
May 14 12:00:38 localhost kernel: [<ffffffff8109bee2>] ? hrtimer_cancel+0x22/0x30
May 14 12:00:38 localhost kernel: [<ffffffff8150f1b3>] ? do_nanosleep+0x93/0xc0
May 14 12:00:38 localhost kernel: [<ffffffff8109bfb4>] ? hrtimer_nanosleep+0xc4/0x180
May 14 12:00:38 localhost kernel: [<ffffffff8109ae00>] ? hrtimer_wakeup+0x0/0x30
May 14 12:00:38 localhost kernel: [<ffffffff8151311e>] ? do_page_fault+0x3e/0xa0
May 14 12:00:38 localhost kernel: [<ffffffff815104d5>] ? page_fault+0x25/0x30
......










At 2015-05-16 02:39:02, "iain wright" <iainwrig@gmail.com> wrote:
>What log is this seen in? Can you paste the log line? Do you mean
>/var/log/messages?
>On May 12, 2015 7:44 PM, "David chen" <c77_cn@163.com> wrote:
>
>> A RegionServer was killed because OutOfMemory(OOM), although  the process
>> killed can be seen in the Linux message log, but i still have two following
>> problems:
>> 1. How to inspect the root reason to cause OOM?
>> 2  When RegionServer encounters OOM, why can't it free some memories
>> occupied? if so, whether or not killer will not need.
>> Any ideas can be appreciated!
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message