mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Niklas Nielsen <nik...@mesosphere.io>
Subject Re: mesos c++ zookeeper blocks indefinately -- any plans to enhance?
Date Mon, 16 Mar 2015 18:33:56 GMT
Hi Craig,

I am sorry you guys have been running into trouble with Zookeeper.
Have you file a JIRA ticket where we can track the issues you are seeing?
That is how we track and schedule (human) resources for bug fixing :)

Thanks!
Niklas

On 4 March 2015 at 13:18, <pinktie@safe-mail.net> wrote:

> hi again mesos users and devs,
> In the prior post i left with description of hanging program with mesos
> zookeeper c++ api and wondered about enhancement to not wait indefinitely
> when underlying zookeeper responses dont occur.
> At that time i thought perhaps the underlying zookeeper and/or its C
> binding might not be responding up to the mesos api callers.
> So, while the question is still outstanding, I now see that potentially
> the hanging issue is with the mesos implementation over zookeeper c binding.
> In particular i've now tried a similar scenario just with zookeeper c
> binding api.
> That is, do zk aget/complete from within a watcher for events for the
> CHANGED event from a prior aset/complete.
> i dont see any blocking indefinitely and both the aget and aset
> completions are invoked and finish.
>
> Unless i'm not reproducing this properly, what i determine is a bad
> behavior from the mesos c++ api.
> Somehow the mesos c++ zookeeper api implementation is getting itself into
> pthread condition waits with nothing to notify and break the waits.
> this seems to occur with get calls from a Watcher on CHANGED events.
>
> craig
>
>
>
>
> -------- Original Message --------
> From: pinktie@Safe-mail.net
> Apparently from: user-return-2761-pinktie=safe-mail.net@mesos.apache.org
> To: user@mesos.apache.org
> Subject: mesos c++ zookeeper blocks indefinately -- any plans to enhance?
> Date: Wed, 4 Mar 2015 10:05:54 -0500
>
> > hi mesos users and devs,
> > We've observed that that the mesos 0.22.0-rc1 c++ zookeeper code appears
> to allow indefinite waits on responses.
> > This leads to application hangs blocked inside mesos zookeeper calls.
> > This can happen with a properly running zookeeper presumably able to
> make all responses.
> >
> > Heres how we hung it for eg.
> > We issue a mesos zk set via
> >
> > int ZooKeeper::set    (       const std::string &     path,
> > const std::string &   data,
> > int   version
> > )
> >
> > then inside a Watcher we process on CHANGED event to issue a mesos zk
> get on the same path via
> >
> > int ZooKeeper::get    (       const std::string &     path,
> > bool  watch,
> > std::string *         result,
> > Stat *        stat
> > )
> >
> > we end up with two threads in the process both in pthread_cond_waits
> > #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> > #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f6140, old=0)
> >     at ../../../3rdparty/libprocess/src/gate.hpp:82
> > #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0,
> pid=...)
> >     at ../../../3rdparty/libprocess/src/process.cpp:2476
> > #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
> >     at ../../../3rdparty/libprocess/src/process.cpp:2958
> > #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6ba0,
> duration=...)
> >     at ../../../3rdparty/libprocess/src/latch.cpp:49
> > #5  0x00007f66649452cc in process::Future<int>::await
> (this=0x7fffa0fd9040, duration=...)
> >     at ../../3rdparty/libprocess/include/process/future.hpp:1156
> > #6  0x00007f666493a04d in process::Future<int>::get (this=0x7fffa0fd9040)
> >     at ../../3rdparty/libprocess/include/process/future.hpp:1167
> > #7  0x00007f6664ab1aac in ZooKeeper::set (this=0x803ce0,
> path="/craig/mo", data=
> > ...
> >
> > and
> > #0  0x000000334e20b43c in pthread_cond_wait@@GLIBC_2.3.2 () from
> /lib64/libpthread.so.0
> > #1  0x00007f6664ee1cf5 in Gate::arrive (this=0x7f66380013f0, old=0)
> >     at ../../../3rdparty/libprocess/src/gate.hpp:82
> > #2  0x00007f6664ecef6e in process::ProcessManager::wait (this=0x7f02e0,
> pid=...)
> >     at ../../../3rdparty/libprocess/src/process.cpp:2476
> > #3  0x00007f6664ed2ce9 in process::wait (pid=..., duration=...)
> >     at ../../../3rdparty/libprocess/src/process.cpp:2958
> > #4  0x00007f6664e90558 in process::Latch::await (this=0x7f6638000d00,
> duration=...)
> >     at ../../../3rdparty/libprocess/src/latch.cpp:49
> > #5  0x00007f66649452cc in process::Future<int>::await
> (this=0x7f66595fb6f0, duration=...)
> >     at ../../3rdparty/libprocess/include/process/future.hpp:1156
> > #6  0x00007f666493a04d in process::Future<int>::get (this=0x7f66595fb6f0)
> >     at ../../3rdparty/libprocess/include/process/future.hpp:1167
> > #7  0x00007f6664ab18d3 in ZooKeeper::get (this=0x803ce0,
> path="/craig/mo", watch=false,
> > ....
> >
> > So, really we are asking whether the mesos zk c++ api will be enhanced
> to not block indefinitely when results are beyond a time bound.
> >
> > cheers
> > craig
>

Mime
View raw message