mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tom Arnfeld" <...@duedil.com>
Subject Re: CPU soft lock up on mesos-slave
Date Mon, 31 Aug 2015 23:16:35 GMT
Hi Chris,




Perhaps you've run into https://community.nitrous.io/posts/stability-and-a-linux-oom-killer-bug.
We ran into similar symptoms that you've described and taking the above as the cause solved
all of our issues.




Hope this helps!



--


Tom Arnfeld

Developer // DueDil





(+44) 7525940046

25 Christopher Street, London, EC2A 2BS

On Mon, Aug 31, 2015 at 11:55 PM, Christopher Ketchum <cketchum@ucsc.edu>
wrote:

> Hi all,
> I was running a Mesos cluster on EC2 with c4.8xlarge instance types when
> one of the status checks failed. We are running Mesos 0.22.1 on ubuntu
> 14.04, with kernel version 3.13.0-55-generic. EC2 gave us this console
> output[1]. I did some searching and found similar issues reported here[2]
> on lkml, though those logs indicated a specific task and an older kernel,
> while these logs just show mesos-slave as the causative process.
> Unfortunately, the instance was terminated so I'm not sure how much useful
> debugging can be done. Is this a known issue? We are also using a our own
> python executor, could an error there have caused this?
> [1] http://pastebin.com/NgHi8MnS
> [2] https://lkml.org/lkml/2014/9/30/498
> Thanks,
> Chris
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message