curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremy Stribling <>
Subject Re: adding a "network timeout" to curator?
Date Thu, 27 Feb 2014 05:45:39 GMT
Please correct me if I'm wrong, but I thought Curator went into 
SUSPENDED mode when it gets a Disconnected state event from its ZK 
client.  That is not necessarily the same as a network issue, because 
that ZK keepalive could be stuck in the ZK server processing queue, 
blocked on a slow disk.  What I'm proposing would be a true, 
network-only timeout that could be used to declare a client disconnected 
quickly if there's a network issue, without having to reduce the ZK 
session timeout so low that a slow disk would cause false negatives.  
Does that make sense?


On 02/26/2014 09:25 PM, Jordan Zimmerman wrote:
> Curator should already go into SUSPENDED when there is a connection 
> issue, right? How would this be different?
> -JZ
> ------------------------------------------------------------------------
> From: Jeremy Stribling Jeremy Stribling <>
> Reply: 
> <>
> Date: February 26, 2014 at 7:56:26 AM
> To: 
> <>
> Subject: adding a "network timeout" to curator?
>> Hi all,
>> I started a thread on the ZK list a while back about timeouts in ZK.
>> You can find it in the archives here:

>> The basic idea is that when ZK is running on a node with slow disks
>> (e.g., in a VM), you might want to set your session timeout to a long
>> value (e.g., 30 seconds or 60 seconds), but still detect network
>> timeouts quickly. On that thread, Michi proposed using 'ruok' commands
>> from the client to test network connectivity, along with the normal
>> client pings happening in the background to detect server slowness.
>> I was wondering if this would make sense to provide as part of the
>> Curator Framework or Client. There could be some background thread
>> sending 'ruok' commands to whatever server the client is connected to,
>> and going into SUSPENDED (or LOST?) mode when it hits a timeout or gets
>> a failure back. We might be able to implement something like that here
>> and contribute it back, if it sounds interesting to other people and we
>> can agree on a design. Any thoughts?
>> Jeremy

View raw message