curator-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jordan Zimmerman <jor...@jordanzimmerman.com>
Subject Re: adding a "network timeout" to curator?
Date Thu, 27 Feb 2014 06:16:32 GMT
I see. So, this is a slow network. You’d like a heuristic that puts Curator into SUSPENDED
mode when the network performance drops. Sounds interesting to me. 

-JZ

From: Jeremy Stribling Jeremy Stribling
Reply: Jeremy Stribling strib@vmware.com
Date: February 27, 2014 at 11:15:46 AM
To: Jordan Zimmerman jordan@jordanzimmerman.com, user@curator.apache.org user@curator.apache.org
Subject:  Re: adding a "network timeout" to curator?  
Please correct me if I'm wrong, but I thought Curator went into SUSPENDED mode when it gets
a Disconnected state event from its ZK client.  That is not necessarily the same as a network
issue, because that ZK keepalive could be stuck in the ZK server processing queue, blocked
on a slow disk.  What I'm proposing would be a true, network-only timeout that could be used
to declare a client disconnected quickly if there's a network issue, without having to reduce
the ZK session timeout so low that a slow disk would cause false negatives.  Does that make
sense?

Jeremy

On 02/26/2014 09:25 PM, Jordan Zimmerman wrote:
Curator should already go into SUSPENDED when there is a connection issue, right? How would
this be different?

-JZ

From: Jeremy Stribling Jeremy Stribling
Reply: user@curator.apache.org user@curator.apache.org
Date: February 26, 2014 at 7:56:26 AM
To: user@curator.apache.org user@curator.apache.org
Subject:  adding a "network timeout" to curator?
Hi all,

I started a thread on the ZK list a while back about timeouts in ZK.
You can find it in the archives here:

http://mail-archives.apache.org/mod_mbox/zookeeper-user/201309.mbox/%3C522F7A9D.20800@nicira.com%3E

The basic idea is that when ZK is running on a node with slow disks
(e.g., in a VM), you might want to set your session timeout to a long
value (e.g., 30 seconds or 60 seconds), but still detect network
timeouts quickly. On that thread, Michi proposed using 'ruok' commands
from the client to test network connectivity, along with the normal
client pings happening in the background to detect server slowness.

I was wondering if this would make sense to provide as part of the
Curator Framework or Client. There could be some background thread
sending 'ruok' commands to whatever server the client is connected to,
and going into SUSPENDED (or LOST?) mode when it hits a timeout or gets
a failure back. We might be able to implement something like that here
and contribute it back, if it sounds interesting to other people and we
can agree on a design. Any thoughts?

Jeremy


Mime
View raw message