mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Mahler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MESOS-2451) mesos c++ zookeeper code hangs from api operation from within watcher of CHANGE event
Date Thu, 23 Nov 2017 03:11:00 GMT

    [ https://issues.apache.org/jira/browse/MESOS-2451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16263736#comment-16263736
] 

Benjamin Mahler commented on MESOS-2451:
----------------------------------------

[~bord] [~benjaminhindman] I've filed MESOS-8255 for making the {{ZooKeeper}} class asynchronous.

I also filed MESOS-8256 to better handle deadlocks in libprocess.

> mesos c++ zookeeper code hangs from api operation from within watcher of CHANGE event
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-2451
>                 URL: https://issues.apache.org/jira/browse/MESOS-2451
>             Project: Mesos
>          Issue Type: Bug
>          Components: c++ api
>    Affects Versions: 0.22.0
>         Environment: red hat linux 6.5
>            Reporter: craig bordelon
>         Attachments: Makefile, bug.cpp, bug0.cpp, log.h
>
>
> We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears to hang (two
threads stuck in indefinite pthread condition waits) on a test case that as best we can tell
is mesos issue and not issue with underlying apache zookeeper C binding.
> (that is we tried same type case using apache zookeeper C binding directly and saw no
issues.)
> This happens with a properly running zookeeper (standalone is sufficient).
> Heres how we hung it:
> We issue a mesos zk set via
> int ZooKeeper::set      (       const std::string &     path,
> const std::string &     data,
> int     version 
> )       
> then inside a Watcher we process on CHANGED event to issue a mesos zk get on 
> the same path via
> int ZooKeeper::get      (       const std::string &     path,
> bool    watch,
> std::string *   result,
> Stat *  stat 
> )       
> we end up with two threads in the process both in pthread_cond_waits
> #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
>     at ../../../3rdparty/libprocess/src/gate.hpp:82
> #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
>     at ../../../3rdparty/libprocess/src/process.cpp:2476
> #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
>     at ../../../3rdparty/libprocess/src/process.cpp:2958
> #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0, duration=...)
>     at ../../../3rdparty/libprocess/src/latch.cpp:49
> #5  0x00007f66649452cc in process::Future<int>::await (this=0x7fffa0fd9040, 
> duration=...)
>     at ../../3rdparty/libprocess/include/process/future.hpp:1156
> #6  0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040)
>     at ../../3rdparty/libprocess/include/process/future.hpp:1167
> #7  0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0, path="/craig/mo", data=
> ...
> and
> #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from 
> /lib64/libpthread.so.0
> #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
>     at ../../../3rdparty/libprocess/src/gate.hpp:82
> #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0, pid=...)
>     at ../../../3rdparty/libprocess/src/process.cpp:2476
> #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
>     at ../../../3rdparty/libprocess/src/process.cpp:2958
> #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00, 
> duration=...)
>     at ../../../3rdparty/libprocess/src/latch.cpp:49
> #5  0x00007f66649452cc in process::Future<int>::await (this=0x7f66595fb6f0, 
> duration=...)
>     at ../../3rdparty/libprocess/include/process/future.hpp:1156
> #6  0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0)
>     at ../../3rdparty/libprocess/include/process/future.hpp:1167
> #7  0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0, path="/craig/mo", 
> watch=false,
> ....
> We of course have a separate "enhancement" suggestion that the mesos C++ zookeeper api
use timed waits and not block indefinitely for responses.
> But this case we think the mesos code itself is blocking on itself and not handling the
responses.
> craig



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message