zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: libzookeeper_mt and GDB
Date Mon, 18 Jul 2011 16:09:09 GMT
I haven't used gdb in a bunch of years and looking at the manual, I don't
see a way to continue a single thread.  That makes my second suggestion
silly unless there is something I didn't see (which is decidedly possible).

On Mon, Jul 18, 2011 at 9:05 AM, Ted Dunning <ted.dunning@gmail.com> wrote:

> I have two suggestions that might or might not work.
>
> First, you can increase the timeouts to high values and also write a bit of
> code that can expire the session instantly.  The ZK unit tests have examples
> of how to do this by opening a second connection with the same session id
> and then closing it.  This has the effect of instantly expiring the original
> connection.  You still have a bit of an education process here.  This is
> high risk since the configuration file with the long timeouts will probably
> get checked in by mistake at some point.  There might be a way to avoid this
> with a special startup option that over-rides the session length for just
> the one invocation.
>
> A second idea is that you might be able to define a gdb macro that is
> invoked when you hit a breakpoint and another that is invoked at continue
> time (or manually).  The first macro would invoke a function to start or
> continue a background thread that can keep the heartbeats going.  The second
> macro would kill that thread and restore normal operation.  The ideal case
> would be to continue just the normal ZK heartbeat thread except that might
> cause notifications to be called in the background which could confuse the
> person doing the debugging.
>
> If you can make it work, the second approach would give you something
> approaching a normal debugging experience.
>
>
> On Mon, Jul 18, 2011 at 7:41 AM, Fournier, Camille F. <
> Camille.Fournier@gs.com> wrote:
>
>> ZooKeeper can't possibly know that you are in GDB unless you have a
>> special message that you send to the server that says "I'm in a debugger
>> now, please don't expire me". You might be able to hack something in to do
>> this, but do you really want to? I think the second idea is best. If you are
>> a developer working in any kind of multi-threaded distributed system, you
>> need to be aware that suspending all threads can lead to the remote parts of
>> your process failing. That's just professional distributed systems
>> development 101. This isn't unique to C, Java developers also have to choose
>> between suspending all threads during debugging and suspending only the
>> thread affected by the breakpoint.
>>
>> You can also split the difference between points one and two, namely, get
>> the message out to the developers that if they're working against ZK and
>> suspend all threads, they might end up losing their session, but when
>> working in an env that you expect to do a lot of debugging in (development,
>> QA), jack up the timeout so it happens less frequently.
>>
>> If you truly want to separate the process from its zookeeper heartbeating,
>> you could take a tip from the HBASE devs in
>> https://issues.apache.org/jira/browse/HBASE-1316. Because dealing with
>> timeouts is much more of an issue in large Java processes due to full GC,
>> they have experimented with various solutions that you might be able to
>> apply here in C.
>>
>> C
>>
>>
>> -----Original Message-----
>> From: Stephen Tyree [mailto:tyree731@gmail.com]
>> Sent: Monday, July 18, 2011 10:07 AM
>> To: user@zookeeper.apache.org
>> Subject: libzookeeper_mt and GDB
>>
>> Hello All,
>>
>> I've been using Zookeeper at my place of work for a few months now
>> successfully, but there has been a lingering issue I haven't been able
>> to solve without issue. Namely, when using GDB with libzookeeper_mt,
>> once you hit a breakpoint, the program you're running essentially has
>> until the session timeout to continue onward or its session will be
>> expired. This is a pain in the butt when using ephemeral znodes, but in
>> my case those ephemeral znodes are tied to locks which means losing them
>> is bad news. I've tried a number of different ideas to solve this issue,
>> and all of them have varying degrees of success.
>>
>> The first idea I had was jacking up the session timeouts, which
>> obviously works. This extends the time you have at any given breakpoint
>> to figure out the issue and move onward, but comes at the expense of
>> ephemeral znodes living for much longer than they reasonably should when
>> the program crashes (something that is likely to be an issue if you're
>> using GDB). In the case of locking, those znodes which hang around for a
>> while have negative consequences on the performance of the system. This
>> is how we currently deal with the issue.
>>
>> The second idea was to instruct all developers at my job to use GDB
>> non-stop mode for debugging. This works, since GDB would only stop the
>> thread which hit a breakpoint in this mode, but runs into the issue that
>> I need to change the development habits of hundreds of engineers just to
>> save myself the trouble. Ideally Zookeeper would function with GDB in
>> whatever mode you felt like using.
>>
>> The third idea was decidedly more intricate. Essentially I spawn a
>> subprocess which uses the exact same session I do, but only holds onto
>> that session while the parent process is unresponsive (at a breakpoint
>> probably). This essentially locks your session while at breakpoints, but
>> has no impact while not at breakpoints. The only caveat to this approach
>> is the transition between breakpoints and non-breakpoints. Since the
>> server last saw the session in the subprocess, it doesn't send heartbeat
>> messages to the parent process. This means it's up to the parent process
>> to send PING messages to the server in order to reestablish the session,
>> but this only happens at 1/3 of the session timeout (which is too long).
>>
>> Whatever the case, a simple, generic solution would be ideal for this
>> situation. It might be as simple as allowing configurable PING messages
>> (for the third solution) or it might be as frustrating as creating a
>> Zookeeper service which runs outside of the process (thus bypassing
>> GDB's breakpoints). Any ideas?
>>
>> Thanks,
>> Stephen Tyree
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message