accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "John Vines (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (ACCUMULO-1374) Sudden Death of master, gc, and tservers
Date Fri, 03 May 2013 23:18:22 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

John Vines resolved ACCUMULO-1374.
----------------------------------

    Resolution: Invalid
      Assignee:     (was: Eric Newton)

PEBCAK

{code}
grep -i kill /var/log/syslog | tail
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.570901] Out of memory: Kill process 2318 (java)
score 480 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.570931] Killed process 2318 (java) total-vm:5369512kB,
anon-rss:3655040kB, file-rss:0kB
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.676155] java invoked oom-killer: gfp_mask=0x201da,
order=0, oom_adj=0, oom_score_adj=0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.676196]  [<ffffffff81119745>] oom_kill_process+0x85/0xb0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.698754] Out of memory: Kill process 1342 (java)
score 169 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.698776] Killed process 1342 (java) total-vm:3176364kB,
anon-rss:1287772kB, file-rss:0kB
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.735364] java invoked oom-killer: gfp_mask=0x201da,
order=0, oom_adj=0, oom_score_adj=0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.735403]  [<ffffffff81119745>] oom_kill_process+0x85/0xb0
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.758067] Out of memory: Kill process 1512 (java)
score 60 or sacrifice child
May  3 22:01:59 ip-10-10-1-122 kernel: [1749277.758093] Killed process 1512 (java) total-vm:2531416kB,
anon-rss:461072kB, file-rss:0kB
{code}
                
> Sudden Death of master, gc, and tservers
> ----------------------------------------
>
>                 Key: ACCUMULO-1374
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-1374
>             Project: Accumulo
>          Issue Type: Bug
>          Components: gc, master, tserver
>         Environment: 1.5, svn#1470047 & 1477382 - both in standalone instance on
ec2 on ubuntu and small cluster on bare metal CentOs
>            Reporter: John Vines
>            Priority: Blocker
>             Fix For: 1.5.0
>
>
> I wish I could provide more information. This has happened once on a bare metal centos
cluster while running vanilla continuous ingest of svn#1470047. There was nothing reported
in the logs when one of the tservers just died after the system had been up for ~1 day. The
out and err files were sparse, and the master only reported that it had lost connection with
the tserver at the point when the tserver just stopped logging (it was overnight, so this
was not witnessed until morning).
> It recently happened again on a standalone instance on ec2 running ubuntu and svn#1477382.
The instance had been running for ~7 hours. This time the gc, master, and tserver died. The
gc died first, and then 2m:48s later the master died. 200ms later the tserver died. Again,
there was no output in any of the out or err files for the processes. The logs also have no
errors or warnings in them, just abrupt stops. The processes came up fine once restarted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message