zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Jorgensen <and...@andrewjorgensen.com>
Subject Re: Zookeeper c client disconnecting after hours of runtime
Date Thu, 04 May 2017 12:58:09 GMT
One other piece of information that might be helpful is if I look at "lsof
-n -P" for the process I can see there are 3 entries for connections to
zookeeper on port 2181 in the ESTABLISHED state. However If i look at a
node that has had the session timeout it appears that the connection is in
the CLOSE_WAIT state. Is there a clean way to recover from this?

Andrew Jorgensen
@ajorgensen

On Wed, May 3, 2017 at 11:32 PM, Andrew Jorgensen <
andrew@andrewjorgensen.com> wrote:

> I am going to try to provide as much information as possible but it might
> be a bit sparse because I am still actively trying to get a grip on what
> exactly I'm seeing with the c client.
>
> Zookeeper client version: 3.4.5
> Zookeeper server version: 3.4.10
> 5 node zookeeper cluster
>
> The workflow I have is essentially a long lived process establishes an
> ephemeral node with some data that is read by some number of other
> processes located on separate machines, standard cluster coordination
> stuff. The issue I am seeing is after about 7-9 hours of runtime, zookeeper
> will expire the client session because it has reached the 30 second
> timeout. On the zookeeper client side, I've confirmed there are no calls to
> the supplied watcher functions or context supplied to zookeeper_init. The
> long lived process is doing other things during its runtime but the
> interaction with zookeeper is only via callback events and a pipe after
> establishing the ephemeral node at the beginning.
>
> One other datapoint is that I created an event loop that uses the same
> client that established the ephemeral node to get the data from the
> ephemeral node every 60 seconds and log it. While this event loop is
> running I do not observe the client session expiring at all even after 14
> hours or runtime.
>
> I am not sure how to explain the client disconnecting without any message
> to either the callback function or the context. I also am not sure how to
> explain this behavior happening after many hours of running without issue.
>
> If anyone has seen something similar, how did you go about fixing it. Also
> if there are any ideas on how to debug this issue that would be very
> helpful.
>
> Thanks!
> Andrew Jorgensen
> @ajorgensen
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message