zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ahmed H." <ahmed.ham...@gmail.com>
Subject Re: Problems with running ZK on a shared disk
Date Thu, 23 Jan 2014 22:05:21 GMT
Thanks for the response Nikhil.

What about timeouts? I have been reading about increasing timeouts to
alleviate some of those symptoms but I am unsure of which timeouts they are
referring to. Can you provide some insight?

I currently have one Zookeeper instance so forceSync shouldn't have any
major downsides in this case. I will certainly give it a try when I get the
chance.

Thanks


On Thu, Jan 23, 2014 at 3:17 PM, Nikhil <mnikhil@gmail.com> wrote:

> Try forcesync=no
>
> forceSync
>
> (Java system property: *zookeeper.forceSync*)
>
> Requires updates to be synced to media of the transaction log before
> finishing processing the update. If this option is set to no, ZooKeeper
> will not require updates to be synced to the media.
>
>
> This is a risk unless your zookeeper nodes are in the same rack.
>
>
> Check also this
>
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/zookeeper_psuedo_scalability_and_absolute
>
>
> On Thu, Jan 23, 2014 at 10:53 AM, Ahmed H. <ahmed.hammad@gmail.com> wrote:
>
> > Hello,
> >
> > I am running ZK on a shared disk (I know, I shouldn't be, but I am
> > constrained right now) alongside Kafka 0.8 beta. What we are experiencing
> > is a problem where we get really long fsync times (according to the
> logs),
> > followed by a loss of connection of our Kafka clients. Kafka attempts to
> > reconnect a few times and eventually it dies because it hits the maximum
> > retry attempts.
> >
> > The fsync error is seen below:
> >
> > 2014-01-23 13:18:38,746 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 12762ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> > 2014-01-23 13:23:41,332 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 7552ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> > 2014-01-23 13:28:49,656 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 6350ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> > 2014-01-23 13:33:45,063 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 1039ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> > 2014-01-23 13:34:00,024 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 9490ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> > 2014-01-23 13:44:09,003 [myid:] - WARN  [SyncThread:0:FileTxnLog@321] -
> > fsync-ing the write ahead log in SyncThread:0 took 8747ms which will
> > adversely effect operation latency. See the ZooKeeper troubleshooting
> guide
> >
> >
> > This is also followed by some of these for good measure:
> >
> > 2014-01-23 13:49:19,427 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@180]
> -
> > Unexpected Exception:
> > java.nio.channels.CancelledKeyException
> > at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> > at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> >  at
> >
> >
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> > at
> >
> >
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> >  at
> >
> >
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170)
> > at
> >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
> >  at
> >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> >
> >
> > The way I see it is that I currently have two problems: 1) The setup of
> ZK
> > is an issue due to the shared disk, and 2) Kafka clients do not
> > automatically recover when it hits the maximum number of retries. I am
> > looking for a way to at least mitigate the zookeeper issue. Perhaps if I
> > modify the timeouts in such a way that the Kafka clients don't fail like
> > they do...
> >
> > What are the best ways to mitigate the issue for now, as I am limited to
> a
> > single disk? Increasing tickTime? My current ZK config is the default
> that
> > comes with version 3.4.5, so the tickTime is 2000. My Kafka clients have
> > defined the zktimeout variable to be 30000.
> >
> > I realize that this is a Zookeeper mailing list, but right now I cannot
> > pinpoint the exact cause of my problems, but it appears to me that ZK is
> > the one.
> >
> > Thanks
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message