mesos-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dominic Hamon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-1404) Glibc 'fork()' is not async signal safe
Date Tue, 27 May 2014 21:40:02 GMT

    [ https://issues.apache.org/jira/browse/MESOS-1404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010318#comment-14010318
] 

Dominic Hamon commented on MESOS-1404:
--------------------------------------

The syscall effort may not work on OSX: http://stackoverflow.com/questions/11301681/how-do-i-call-fork-directly-bypassing-libc


> Glibc 'fork()' is not async signal safe
> ---------------------------------------
>
>                 Key: MESOS-1404
>                 URL: https://issues.apache.org/jira/browse/MESOS-1404
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jie Yu
>            Assignee: Jie Yu
>
> This is due to 'fork()' is not implemented async signal safe in glibc, although according
to Posix, it should be. When the child tries to execute commands returned from isolator prepare(),
it will use os::system which uses 'fork'.
> I observed this stack trace when I debug a deadlock:
> {noformat}
> (gdb) bt
> #0  0x00007f8fb2d5d2ce in __lll_lock_wait_private () from /lib64/libc.so.6
> #1  0x00007f8fb2ce1d8e in _L_lock_44 () from /lib64/libc.so.6
> #2  0x00007f8fb2cdab4c in ptmalloc_lock_all () from /lib64/libc.so.6
> #3  0x00007f8fb2d11d65 in fork () from /lib64/libc.so.6
> #4  0x00007f8fb4e898de in system (command=..., directory=<value optimized out>,
envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29, pipeWrite=30,

>     commands=std::list = {...}) at ../../../mesos/3rdparty/libprocess/3rdparty/stout/include/stout/os.hpp:558
> #5  mesos::internal::slave::execute (command=..., directory=<value optimized out>,
envp=..., uid=0, gid=0, redirectIO=<value optimized out>, pipeRead=29, pipeWrite=30,

>     commands=std::list = {...}) at ../../../mesos/src/slave/containerizer/mesos_containerizer.cpp:483
> #6  0x00007f8fb4e97bab in __call<, 0, 1, 2, 3, 4, 5, 6, 7, 8> (__functor=<value
optimized out>)
>     at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1137
> #7  operator()<> (__functor=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1191
> #8  std::tr1::_Function_handler<int(), std::tr1::_Bind<int (*(mesos::CommandInfo,
std::basic_string<char, std::char_traits<char>, std::allocator<char> >,
os::ExecEnv, unsigned int, unsigned int, bool, int, int, std::list<Option<mesos::CommandInfo>,
std::allocator<Option<mesos::CommandInfo> > >))(const mesos::CommandInfo&,
const std::string&, const os::ExecEnv&, uid_t, gid_t, bool, int, int, const std::list<Option<mesos::CommandInfo>,
std::allocator<Option<mesos::CommandInfo> > >&)> >::_M_invoke(const
std::tr1::_Any_data &) (__functor=<value optimized out>) at /usr/lib/gcc/x86_64-redhat-linux/4.4.7/../../../../include/c++/4.4.7/tr1_impl/functional:1654
> #9  0x00007f8fb4fcaebe in mesos::internal::slave::_childMain(const std::tr1::function<int()>
&, int *) (childFunction=..., pipes=0x7f8fad4f0040)
>     at ../../../mesos/src/slave/containerizer/linux_launcher.cpp:193
> #10 0x00007f8fb2d4db6d in clone () from /lib64/libc.so.6
> (gdb) info thread
> * 1 Thread 0x7f8fad4f1700 (LWP 62980)  0x00007f8fb2d5d2ce in __lll_lock_wait_private
() from /lib64/libc.so.6
> {noformat}
> This stack trace matches the stack trace that has been discussed in glibc issue tracker:
> https://sourceware.org/bugzilla/show_bug.cgi?id=4737
> And they marked this issue as "WON'T FIX". Here is some discussion:
> {noformat}
> The Austin group met yesterday and retained the decision to interpret fork as
> async-signal-unsafe with future specifications mandating that posix_spawn be
> made async-signal-safe to fill the functionality gap.  Minutes of the meeting
> are available at https://www.opengroup.org/austin/docs/austin_446.txt.
> I think this bug can now be closed as "WONTFIX"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message