zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Neha Narkhede <neha.narkh...@gmail.com>
Subject Re: Problems with running ZK on a shared disk
Date Fri, 24 Jan 2014 01:19:57 GMT
The timeout to increase would be the zookeeper "session timeout". For
Kafka, the appropriate config is "zookeeper.session.timeout.ms".

Thanks,
Neha


On Thu, Jan 23, 2014 at 2:05 PM, Ahmed H. <ahmed.hammad@gmail.com> wrote:

> Thanks for the response Nikhil.
>
> What about timeouts? I have been reading about increasing timeouts to
> alleviate some of those symptoms but I am unsure of which timeouts they are
> referring to. Can you provide some insight?
>
> I currently have one Zookeeper instance so forceSync shouldn't have any
> major downsides in this case. I will certainly give it a try when I get the
> chance.
>
> Thanks
>
>
> On Thu, Jan 23, 2014 at 3:17 PM, Nikhil <mnikhil@gmail.com> wrote:
>
> > Try forcesync=no
> >
> > forceSync
> >
> > (Java system property: *zookeeper.forceSync*)
> >
> > Requires updates to be synced to media of the transaction log before
> > finishing processing the update. If this option is set to no, ZooKeeper
> > will not require updates to be synced to the media.
> >
> >
> > This is a risk unless your zookeeper nodes are in the same rack.
> >
> >
> > Check also this
> >
> >
> http://www.edwardcapriolo.com/roller/edwardcapriolo/entry/zookeeper_psuedo_scalability_and_absolute
> >
> >
> > On Thu, Jan 23, 2014 at 10:53 AM, Ahmed H. <ahmed.hammad@gmail.com>
> wrote:
> >
> > > Hello,
> > >
> > > I am running ZK on a shared disk (I know, I shouldn't be, but I am
> > > constrained right now) alongside Kafka 0.8 beta. What we are
> experiencing
> > > is a problem where we get really long fsync times (according to the
> > logs),
> > > followed by a loss of connection of our Kafka clients. Kafka attempts
> to
> > > reconnect a few times and eventually it dies because it hits the
> maximum
> > > retry attempts.
> > >
> > > The fsync error is seen below:
> > >
> > > 2014-01-23 13:18:38,746 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 12762ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > > 2014-01-23 13:23:41,332 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 7552ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > > 2014-01-23 13:28:49,656 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 6350ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > > 2014-01-23 13:33:45,063 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 1039ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > > 2014-01-23 13:34:00,024 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 9490ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > > 2014-01-23 13:44:09,003 [myid:] - WARN  [SyncThread:0:FileTxnLog@321]
> -
> > > fsync-ing the write ahead log in SyncThread:0 took 8747ms which will
> > > adversely effect operation latency. See the ZooKeeper troubleshooting
> > guide
> > >
> > >
> > > This is also followed by some of these for good measure:
> > >
> > > 2014-01-23 13:49:19,427 [myid:] - ERROR [SyncThread:0:NIOServerCnxn@180
> ]
> > -
> > > Unexpected Exception:
> > > java.nio.channels.CancelledKeyException
> > > at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> > > at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> > >  at
> > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:153)
> > > at
> > >
> > >
> >
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1076)
> > >  at
> > >
> > >
> >
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:170)
> > > at
> > >
> > >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.flush(SyncRequestProcessor.java:167)
> > >  at
> > >
> > >
> >
> org.apache.zookeeper.server.SyncRequestProcessor.run(SyncRequestProcessor.java:101)
> > >
> > >
> > > The way I see it is that I currently have two problems: 1) The setup of
> > ZK
> > > is an issue due to the shared disk, and 2) Kafka clients do not
> > > automatically recover when it hits the maximum number of retries. I am
> > > looking for a way to at least mitigate the zookeeper issue. Perhaps if
> I
> > > modify the timeouts in such a way that the Kafka clients don't fail
> like
> > > they do...
> > >
> > > What are the best ways to mitigate the issue for now, as I am limited
> to
> > a
> > > single disk? Increasing tickTime? My current ZK config is the default
> > that
> > > comes with version 3.4.5, so the tickTime is 2000. My Kafka clients
> have
> > > defined the zktimeout variable to be 30000.
> > >
> > > I realize that this is a Zookeeper mailing list, but right now I cannot
> > > pinpoint the exact cause of my problems, but it appears to me that ZK
> is
> > > the one.
> > >
> > > Thanks
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message