zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nikhil <mnik...@gmail.com>
Subject Re: Problems with running ZK on a shared disk
Date Thu, 23 Jan 2014 20:17:49 GMT
Try forcesync=no

forceSync

(Java system property: *zookeeper.forceSync*)

Requires updates to be synced to media of the transaction log before
finishing processing the update. If this option is set to no, ZooKeeper
will not require updates to be synced to the media.


This is a risk unless your zookeeper nodes are in the same rack.


Check also this
http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/zookeeper_psuedo_scalability_and_absolute


On Thu, Jan 23, 2014 at 10:53 AM, Ahmed H. <ahmed.hammad@gmail.com> wrote:

> Hello,
>
> I am running ZK on a shared disk (I know, I shouldn't be, but I am
> constrained right now) alongside Kafka 0.8 beta. What we are experiencing
> is a problem where we get really long fsync times (according to the logs),
> followed by a loss of connection of our Kafka clients. Kafka attempts to
> reconnect a few times and eventually it dies because it hits the maximum
> retry attempts.
>
> The fsync error is seen below:
>
> 2014-01-23 13:18:38,746 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 12762ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2014-01-23 13:23:41,332 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 7552ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2014-01-23 13:28:49,656 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 6350ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2014-01-23 13:33:45,063 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 1039ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2014-01-23 13:34:00,024 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 9490ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
> 2014-01-23 13:44:09,003 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> fsync-ing the write ahead log in SyncThread:0 took 8747ms which will
> adversely effect operation latency. See the ZooKeeper troubleshooting guide
>
>
> This is also followed by some of these for good measure:
>
> 2014-01-23 13:49:19,427 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@180] -
> Unexpected Exception:
> java.nio.channels.CancelledKeyException
> at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
>  at
>
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> at
>
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
>  at
>
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170)
> at
>
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
>  at
>
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
>
>
> The way I see it is that I currently have two problems: 1) The setup of ZK
> is an issue due to the shared disk, and 2) Kafka clients do not
> automatically recover when it hits the maximum number of retries. I am
> looking for a way to at least mitigate the zookeeper issue. Perhaps if I
> modify the timeouts in such a way that the Kafka clients don't fail like
> they do...
>
> What are the best ways to mitigate the issue for now, as I am limited to a
> single disk? Increasing tickTime? My current ZK config is the default that
> comes with version 3.4.5, so the tickTime is 2000. My Kafka clients have
> defined the zktimeout variable to be 30000.
>
> I realize that this is a Zookeeper mailing list, but right now I cannot
> pinpoint the exact cause of my problems, but it appears to me that ZK is
> the one.
>
> Thanks
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message