zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Simms <slyp...@gmail.com>
Subject Re: How to deal with fork() properly when using the zkc mt lib
Date Sun, 13 May 2012 02:42:42 GMT
Ok, so, I switched to using the single threaded zookeeper lib and
pumping the event loop "by hand," essentially replicating what
zookeeper_mt does, but in a ruby thread. There's only one thread
that's ever touching the zh. So I've been trying to get fork() working
again without having to lose my session in the parent. What I thought
would work is:

* parent process quiesces the event thread so there are
  no pending completions.
* parent exits event thread, keeps handle open
* fork()
* child *immediately* calls zookeeper_close
* parent resumes event thread

Unfortunately, the parent gets:

Assertion failed: (cptr), function zookeeper_process, file
src/zookeeper.c, line 1959.
Abort trap: 6

(the relevant line from the source: http://is.gd/F4sIkq)

This is confusing.

>From what I can understand from the source (and I am no C programmer),
this case is hit when there are "in flight" requests, but for some
reason, dequeue_completion is coming back with a NULL. Is this somehow
being tripped up in the parent by the child calling zookeeper_close?

One last thing, there's no way to close your connection and resume
your session later (using the API)?

I'm a bit out of my depth here, any help would be really appreciated.

On Fri, May 11, 2012 at 2:41 AM, Martin Kou <bitanarch@gmail.com> wrote:
> Jon,
>
> Hmm... I'm not sure when you're going to fork() though. What I've done
> before was to do the fork() before each zookeeper_init(). The simplest
> scheme is for each process to hold one event loop. Each event loop can then
> be shared by as many single-threaded Zookeeper sessions as you see fit. So
> let's say you pre-fork() 4 processes, and each process runs 4 Zookeeper
> sessions - you'll be able to run 16 Zookeeper sessions in parallel.
>
> If you do the fork() after zookeeper_init() - then it will get messier. As
> I understand each forked process will increment the reference count on any
> opened file descriptors. So you'll have to take care to close the "shared"
> file descriptors in every "other" process before you call zookeeper_close().
>
> Best Regards,
> Martin Kou
>
> On Thu, May 10, 2012 at 9:19 PM, Jonathan Simms <slyphon@gmail.com> wrote:
>
>> Well afaict SO_NOSIGPIPE doesn't exist in linux, which kinda sucks, as
>> I need this to be cross platform. I even tried hacking the source to
>> allow an option to not send the last message (diff is here:
>> http://is.gd/NptC0n and yes, I know this is an incredibly naive
>> attempt).
>>
>> This would be an incredibly useful feature in my case (disconnect the
>> client and resume the session within the negotiated timeout).
>>
>> BTW, is it possible to fork safely when using the st library?
>>
>> Thanks
>>
>> On Thu, May 10, 2012 at 9:46 PM, Jonathan Simms <slyphon@gmail.com> wrote:
>> > wow
>> >
>> > That's scary, but, probably also useful. :)
>> >
>> > I'm considering rewriting this using the st library, considering all
>> > the craziness necessary to use the mt lib.
>> >
>> > I'm gonna go try that out.
>> >
>> > On Thu, May 10, 2012 at 7:53 PM, Martin Kou <bitanarch@gmail.com> wrote:
>> >> If you don't mind the hackish-ness, I think you can just grab the file
>> >> descriptor from a Zookeeper handle like this for mt -
>> >>
>> >> int fd = ((int *)zhandle)[0];
>> >>
>> >> This works because the fd is the first field in the _zhandle struct.
>> >>
>> >> Best Regards,
>> >> Martin Kou
>> >>
>> >> On Thu, May 10, 2012 at 4:51 PM, Martin Kou <bitanarch@gmail.com>
>> wrote:
>> >>
>> >>> I've had a similar problem as well, but I've been using the single
>> >>> threaded async library - I actually find it simpler to use than the
mt
>> >>> library.
>> >>>
>> >>> The way I do it is this:
>> >>>
>> >>> During session connect -
>> >>>  1. Grab the file descriptor from the C library via
>> zookeeper_interest()
>> >>>  2. If this is the first time I saw this file descriptor, and it's
>> valid,
>> >>> do a setsockopt() on it to set SO_NOSIGPIPE to 1.
>> >>>
>> >>> When I need to "suspend" the session
>> >>>  1. close() the file descriptor
>> >>>  2. call zookeeper_close() on the handle
>> >>>
>> >>> zookeeper_close() will try to send the close session message at step
2
>> >>> here. Normally, that would cause a SIGPIPE and your app would crash
-
>> but
>> >>> this time it won't because you've set SO_NOSIGPIPE on the socket.
>> Instead,
>> >>> the Zookeeper library will see a regular error from its send operation
>> and
>> >>> it'll free up the handle peacefully without closing the session.
>> >>>
>> >>> Best Regards,
>> >>> Martin Kou
>> >>>
>> >>>
>> >>> On Thu, May 10, 2012 at 4:11 PM, Jonathan Simms <slyphon@gmail.com>
>> wrote:
>> >>>
>> >>>> Michi, fair point, I actually just looked into it, there doesn't
seem
>> >>>> to be a way through the api to re-establish the session. If you
call
>> >>>> zookeeper_close on the handle:
>> >>>>
>> >>>>  "After this call, the client session will no longer be valid.
The
>> >>>> function will flush any outstanding send requests before return.
As a
>> >>>> result it may block."
>> >>>>
>> >>>> I tried:
>> >>>>
>> >>>> * establish session with handle A
>> >>>> * copy clientid_t from handle A
>> >>>> * zookeeper_close handle A
>> >>>> * construct handle B using clientid_t values from handle A
>> >>>>
>> >>>> I get back a SESSION_EXPIRED from the server. (debug from mt lib
here:
>> >>>> https://gist.github.com/3b7e4060746d03cef287)
>> >>>>
>> >>>> It would be *really* useful if i could basically "suspend" a session
>> >>>> while i forked, then reconnect and pick up where i left off. Is
this
>> >>>> not possible?
>> >>>>
>> >>>> On Thu, May 10, 2012 at 6:41 PM, Michi Mutsuzaki <
>> michi@cs.stanford.edu>
>> >>>> wrote:
>> >>>> > Hi Jonathan,
>> >>>> >
>> >>>> > It would be very difficult to share multi-threaded zk handle
with
>> >>>> > child process. I'm surprised it actually works on mac. I think
>> saving
>> >>>> > session id/password and re-establishing the session in the
child
>> >>>> > process is more robust and platform independent.
>> >>>> >
>> >>>> > Thanks!
>> >>>> > --Michi
>> >>>>
>> >>>>
>> >>>> >
>> >>>> > On Thu, May 10, 2012 at 12:45 PM, Jonathan Simms <slyphon@gmail.com
>> >
>> >>>> wrote:
>> >>>> >> Hi all,
>> >>>> >>
>> >>>> >> I'm the maintainer of the ruby zookeeper library, and I'm
having
>> >>>> >> trouble getting consistent behavior when a user calls fork().
When
>> >>>> >> developing it on MacOS (using 3.3.5), I was able to fork,
then
>> >>>> >> immediately call zookeeper_close() in the child, and then
create a
>> new
>> >>>> >> handle. Testing on Linux, the behavior is much more unpredictable.
>> >>>> >> Regularly, it seems there are segfaults when calling
>> zookeeper_close.
>> >>>> >> https://gist.github.com/22338464cd47e0e50970
>> >>>> >>
>> >>>> >>
>> >>>> >> So I guess my question is, is there any safe way to fork()
while
>> the
>> >>>> >> client is running?
>> >>>> >>
>> >>>> >> Another possibility i thought of is to note the session
id/passwd,
>> >>>> >> close the client, fork, then re-open with the same id/passwd
to
>> >>>> >> re-establish the session in the parent.
>> >>>> >>
>> >>>> >> Any recommendations?
>> >>>>
>> >>>
>> >>>
>>

Mime
View raw message