zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fournier, Camille F." <Camille.Fourn...@gs.com>
Subject RE: libzookeeper_mt and GDB
Date Mon, 18 Jul 2011 14:41:48 GMT
ZooKeeper can't possibly know that you are in GDB unless you have a special message that you
send to the server that says "I'm in a debugger now, please don't expire me". You might be
able to hack something in to do this, but do you really want to? I think the second idea is
best. If you are a developer working in any kind of multi-threaded distributed system, you
need to be aware that suspending all threads can lead to the remote parts of your process
failing. That's just professional distributed systems development 101. This isn't unique to
C, Java developers also have to choose between suspending all threads during debugging and
suspending only the thread affected by the breakpoint.

You can also split the difference between points one and two, namely, get the message out
to the developers that if they're working against ZK and suspend all threads, they might end
up losing their session, but when working in an env that you expect to do a lot of debugging
in (development, QA), jack up the timeout so it happens less frequently. 

If you truly want to separate the process from its zookeeper heartbeating, you could take
a tip from the HBASE devs in https://issues.apache.org/jira/browse/HBASE-1316. Because dealing
with timeouts is much more of an issue in large Java processes due to full GC, they have experimented
with various solutions that you might be able to apply here in C.


-----Original Message-----
From: Stephen Tyree [mailto:tyree731@gmail.com] 
Sent: Monday, July 18, 2011 10:07 AM
To: user@zookeeper.apache.org
Subject: libzookeeper_mt and GDB

Hello All,

I've been using Zookeeper at my place of work for a few months now
successfully, but there has been a lingering issue I haven't been able
to solve without issue. Namely, when using GDB with libzookeeper_mt,
once you hit a breakpoint, the program you're running essentially has
until the session timeout to continue onward or its session will be
expired. This is a pain in the butt when using ephemeral znodes, but in
my case those ephemeral znodes are tied to locks which means losing them
is bad news. I've tried a number of different ideas to solve this issue,
and all of them have varying degrees of success.

The first idea I had was jacking up the session timeouts, which
obviously works. This extends the time you have at any given breakpoint
to figure out the issue and move onward, but comes at the expense of
ephemeral znodes living for much longer than they reasonably should when
the program crashes (something that is likely to be an issue if you're
using GDB). In the case of locking, those znodes which hang around for a
while have negative consequences on the performance of the system. This
is how we currently deal with the issue.

The second idea was to instruct all developers at my job to use GDB
non-stop mode for debugging. This works, since GDB would only stop the
thread which hit a breakpoint in this mode, but runs into the issue that
I need to change the development habits of hundreds of engineers just to
save myself the trouble. Ideally Zookeeper would function with GDB in
whatever mode you felt like using.

The third idea was decidedly more intricate. Essentially I spawn a
subprocess which uses the exact same session I do, but only holds onto
that session while the parent process is unresponsive (at a breakpoint
probably). This essentially locks your session while at breakpoints, but
has no impact while not at breakpoints. The only caveat to this approach
is the transition between breakpoints and non-breakpoints. Since the
server last saw the session in the subprocess, it doesn't send heartbeat
messages to the parent process. This means it's up to the parent process
to send PING messages to the server in order to reestablish the session,
but this only happens at 1/3 of the session timeout (which is too long).

Whatever the case, a simple, generic solution would be ideal for this
situation. It might be as simple as allowing configurable PING messages
(for the third solution) or it might be as frustrating as creating a
Zookeeper service which runs outside of the process (thus bypassing
GDB's breakpoints). Any ideas?

Stephen Tyree

View raw message