zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepinder Singh Setia <dse...@juniper.net>
Subject Re: OperationTimeoutException error
Date Wed, 14 Aug 2013 17:11:08 GMT
Hanno, thanks for your feedback. I have a better understanding of the
problem now. 

I am not using a dedicated disk for transaction log or dedicated machine
for Zookeeper. Will seriously consider latter first (which will
automatically solve the former issue).

Meanwhile I have increased the session timeout as a work around. I am able
to do that because my sundry clients do not communicate with Zookeeper
directly, instead go through a proxy process. Thus, it is possible to
increase the session timeout for essentially a single ZooKeeper client.

I am also going to look at client side retries along with tuning of GC
parameters to further alleviate the problem.


On 8/10/13 12:02 PM, "Hanno Schlichting" <hanno@hannosch.eu> wrote:

>On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote:
>> Aug  9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation
>> timeout
>That's one of the "retryable exceptions" in Kazoo. So if you'd use
>client.retry, you could tolerate one or more instances of this error.
>> zookeeper logs around the error time:
>> 2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
>> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will
>> adversely effect operation latency. See the ZooKeeper troubleshooting
>> guide
>More than 2 seconds of fsync stall is quite long. And with that or GC
>pauses, it's more than likely that you exceed the session timeout
>Did you follow the recommendations in
>http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html? Especially
>around using dedicated disks for the transaction log and using a
>dedicated machine for Zookeeper to avoid other processes stalling it?
>> Could the client (Kazoo) be timing out because of fsync delay? What
>> parameter would control duration for OperationTimeoutException that I
>> perhaps increase to verify? There is only ZooKeeper client and the load
>> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper
>> is default. Kazoo client params are also default.
>In the admin guide, look at tickTime and syncLimit. In a default config
>the session timeout is ~4 seconds. While you can increase this value,
>you thereby also increase the minimum time it takes Zookeeper to
>consider an actual client to be dead. Depending on what you use ZK for,
>you might prefer failing fast and thus low session timeout values.

View raw message