mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bruce Merry (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-2369) Segfault when mesos-slave tries to clean up docker containers on startup
Date Fri, 10 Feb 2017 14:07:41 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861315#comment-15861315
] 

Bruce Merry commented on MESOS-2369:
------------------------------------

Confirmed on another machine, this time running Ubuntu 16.04, again with Mesos 1.1.0 from
the Mesosphere PPA. To reproduce (as root):
1. for i in {1..4000}; do docker run ubuntu:xenial-20161010 /bin/true; done
2. service mesos-slave stop
3. MESOS_RECOVER=cleanup mesos-init-wrapper slave
I found that 3000 wasn't enough to trigger the segfault, but 4000 was.

If I run "ulimit -s 32768" first, then the segfault does not occur, so it is presumably a
stack overflow.

I'm not sure if this is related to MESOS_RECOVER=cleanup at all; I included it since that
where I first encountered the issue.

> Segfault when mesos-slave tries to clean up docker containers on startup
> ------------------------------------------------------------------------
>
>                 Key: MESOS-2369
>                 URL: https://issues.apache.org/jira/browse/MESOS-2369
>             Project: Mesos
>          Issue Type: Bug
>          Components: docker
>    Affects Versions: 0.21.1
>         Environment: Debian Jessie, mesos package 0.21.1-1.2.debian77 
> docker 1.3.2 build 39fa2fa
>            Reporter: Pas
>
> I did a gdb backtrace, it seems like a stack overflow due to a bit too much recursion.
> The interesting aspect is that after running mesos-slave with strace -f -b execve it
successfully proceeded with the docker cleanup. However, there were a few strace sessions
(on other slaves) where I was able to observe the SIGSEGV, and it was around (or a bit before)
the "docker ps -a" call, because docker got a broken pipe shortly, then got killed by the
propagating SIGSEGV signal.
> {code}
> ....
> #59296 0x00007ffff6e7cd98 in process::Future<std::string> process::Future<unsigned
long>::then<std::string>(std::tr1::function<process::Future<std::string>
(unsigned long const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59297 0x00007ffff6e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59298 0x00007ffff6e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59299 0x00007ffff6e53000 in std::tr1::_Function_handler<process::Future<std::string>
(unsigned long const&), std::tr1::_Bind<process::Future<std::string> (*(std::tr1::_Placeholder<1>,
int, std::tr1::shared_ptr<std::string>, boost::shared_array<char>, unsigned long))(unsigned
long, int, std::tr1::shared_ptr<std::string> const&, boost::shared_array<char>
const&, unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long
const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59300 0x00007ffff6e7d23b in void process::internal::thenf<unsigned long, std::string>(std::tr1::shared_ptr<process::Promise<std::string>
> const&, std::tr1::function<process::Future<std::string> (unsigned long const&)>
const&, process::Future<unsigned long> const&) ()
>    from /usr/local/lib/libmesos-0.21.1.so
> #59301 0x00007ffff689ee60 in process::Future<unsigned long>::onAny(std::tr1::function<void
(process::Future<unsigned long> const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59302 0x00007ffff6e7cd98 in process::Future<std::string> process::Future<unsigned
long>::then<std::string>(std::tr1::function<process::Future<std::string>
(unsigned long const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59303 0x00007ffff6e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59304 0x00007ffff6e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59305 0x00007ffff6e53000 in std::tr1::_Function_handler<process::Future<std::string>
(unsigned long const&), std::tr1::_Bind<process::Future<std::string> (*(std::tr1::_Placeholder<1>,
int, std::tr1::shared_ptr<std::string>, boost::shared_array<char>, unsigned long))(unsigned
long, int, std::tr1::shared_ptr<std::string> const&, boost::shared_array<char>
const&, unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long
const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59306 0x00007ffff6e7d23b in void process::internal::thenf<unsigned long, std::string>(std::tr1::shared_ptr<process::Promise<std::string>
> const&, std::tr1::function<process::Future<std::string> (unsigned long const&)>
const&, process::Future<unsigned long> const&) ()
>    from /usr/local/lib/libmesos-0.21.1.so
> #59307 0x00007ffff689ee60 in process::Future<unsigned long>::onAny(std::tr1::function<void
(process::Future<unsigned long> const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59308 0x00007ffff6e7cd98 in process::Future<std::string> process::Future<unsigned
long>::then<std::string>(std::tr1::function<process::Future<std::string>
(unsigned long const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59309 0x00007ffff6e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59310 0x00007ffff6e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59311 0x00007ffff6e53000 in std::tr1::_Function_handler<process::Future<std::string>
(unsigned long const&), std::tr1::_Bind<process::Future<std::string> (*(std::tr1::_Placeholder<1>,
int, std::tr1::shared_ptr<std::string>, boost::shared_array<char>, unsigned long))(unsigned
long, int, std::tr1::shared_ptr<std::string> const&, boost::shared_array<char>
const&, unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long
const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59312 0x00007ffff6e7d23b in void process::internal::thenf<unsigned long, std::string>(std::tr1::shared_ptr<process::Promise<std::string>
> const&, std::tr1::function<process::Future<std::string> (unsigned long const&)>
const&, process::Future<unsigned long> const&) ()
>    from /usr/local/lib/libmesos-0.21.1.so
> #59313 0x00007ffff689ee60 in process::Future<unsigned long>::onAny(std::tr1::function<void
(process::Future<unsigned long> const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59314 0x00007ffff6e7cd98 in process::Future<std::string> process::Future<unsigned
long>::then<std::string>(std::tr1::function<process::Future<std::string>
(unsigned long const&)> const&) const () from /usr/local/lib/libmesos-0.21.1.so
> #59315 0x00007ffff6e4f5d3 in process::io::internal::_read(int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59316 0x00007ffff6e5012c in process::io::internal::__read(unsigned long, int, std::tr1::shared_ptr<std::string>
const&, boost::shared_array<char> const&, unsigned long) () from /usr/local/lib/libmesos-0.21.1.so
> #59317 0x00007ffff6e53000 in std::tr1::_Function_handler<process::Future<std::string>
(unsigned long const&), std::tr1::_Bind<process::Future<std::string> (*(std::tr1::_Placeholder<1>,
int, std::tr1::shared_ptr<std::string>, boost::shared_array<char>, unsigned long))(unsigned
long, int, std::tr1::shared_ptr<std::string> const&, boost::shared_array<char>
const&, unsigned long)> >::_M_invoke(std::tr1::_Any_data const&, unsigned long
const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59318 0x00007ffff6e7d23b in void process::internal::thenf<unsigned long, std::string>(std::tr1::shared_ptr<process::Promise<std::string>
> const&, std::tr1::function<process::Future<std::string> (unsigned long const&)>
const&, process::Future<unsigned long> const&) ()
>    from /usr/local/lib/libmesos-0.21.1.so
> #59319 0x00007ffff6c0f138 in process::Future<unsigned long>::set(unsigned long
const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59320 0x00007ffff6e46a64 in process::io::internal::read(int, void*, unsigned long, std::tr1::shared_ptr<process::Promise<unsigned
long> > const&, process::Future<short> const&) () from /usr/local/lib/libmesos-0.21.1.so
> #59321 0x00007ffff6e59508 in process::Future<short>::set(short const&) () from
/usr/local/lib/libmesos-0.21.1.so
> #59322 0x00007ffff6e59469 in process::Future<short>::set(short const&) () from
/usr/local/lib/libmesos-0.21.1.so
> #59323 0x00007ffff6e3422e in process::polled(ev_loop*, ev_io*, int) () from /usr/local/lib/libmesos-0.21.1.so
> #59324 0x00007ffff6ead365 in ev_invoke_pending (loop=0x7ffff7ddb460 <default_loop_struct>)
at ev.c:2994
> #59325 0x00007ffff6eb03c5 in ev_run (loop=0x7ffff7ddb460 <default_loop_struct>,
flags=<optimized out>) at ev.c:3394
> #59326 0x00007ffff6e3235b in process::serve(void*) () from /usr/local/lib/libmesos-0.21.1.so
> #59327 0x00007ffff474d0a4 in start_thread (arg=0x7fffebe27700) at pthread_create.c:309
> #59328 0x00007ffff4481ccd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message