zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanno Schlichting <ha...@hannosch.eu>
Subject Re: OperationTimeoutException error
Date Sat, 10 Aug 2013 19:02:08 GMT

On Fri, Aug 9, 2013, at 18:13, Deepinder Singh Setia wrote:
> Aug  9 07:07:20 a2s1 python[2085]: OperationTimeoutException: operation
> timeout

That's one of the "retryable exceptions" in Kazoo. So if you'd use
client.retry, you could tolerate one or more instances of this error.

> zookeeper logs around the error time:
> 2013-08-09 07:07:06,580 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 2291ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting
> guide

More than 2 seconds of fsync stall is quite long. And with that or GC
pauses, it's more than likely that you exceed the session timeout

Did you follow the recommendations in
http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html? Especially
around using dedicated disks for the transaction log and using a
dedicated machine for Zookeeper to avoid other processes stalling it?

> Could the client (Kazoo) be timing out because of fsync delay? What
> parameter would control duration for OperationTimeoutException that I can
> perhaps increase to verify? There is only ZooKeeper client and the load
> isn't much - 1 read/sec and 2 writes/sec roughly. Zookeeper configuration
> is default. Kazoo client params are also default. 

In the admin guide, look at tickTime and syncLimit. In a default config
the session timeout is ~4 seconds. While you can increase this value,
you thereby also increase the minimum time it takes Zookeeper to
consider an actual client to be dead. Depending on what you use ZK for,
you might prefer failing fast and thus low session timeout values.


View raw message