zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raúl Gutiérrez Segalés <...@itevenworks.net>
Subject Re: Unstable work of zookeeper
Date Thu, 24 Sep 2015 18:38:59 GMT
On 24 September 2015 at 06:36, Flavio Junqueira <fpj@apache.org> wrote:

> I can see that the client is disconnecting from the server, and there is
> also a new round of leader election for the zookeeper servers. If this is
> happening, then yeah, your ensemble is unstable. If the ensemble leader
> election is being triggered frequently, then I'd start by looking there.
> Try to determine why the ensemble is failing to continue with the same
> leader. If ensemble elections aren't happening frequently, then another
> possibility is that GC pauses are causing the session to expire.
>

On the other hand, if it's a low traffic cluster you might need to enable
TCP keepalives  to ensure election connections between the cluster members
don't go away (the ZAB connections on the other hand, iirc, have protocol
level pings so those are fine.. I think):

https://issues.apache.org/jira/browse/ZOOKEEPER-1748


-rgs



> -Flavio
>
> > On 24 Sep 2015, at 05:24, Akmal Abbasov <akmal.abbasov@icloud.com>
> wrote:
> >
> > Hi,
> > I am using zookeeper 3.4.6
> > I have a spark cluster configured with HA. Once per 1-2 days, the active
> spark master is shutting down with a message
> > 15/09/23 18:58:18 INFO zookeeper.ClientCnxn: Unable to read additional
> data from server sessionid 0x34ffa68dbd10021, likely server has closed
> socket, closing socket connection and attempting reconnect
> > 15/09/23 18:58:18 INFO state.ConnectionStateManager: State change:
> SUSPENDED
> > 15/09/23 18:58:18 INFO master.ZooKeeperLeaderElectionAgent: We have lost
> leadership
> > 15/09/23 18:58:18 ERROR master.Master: Leadership has been revoked --
> master shutting down.
> > 15/09/23 18:58:18 INFO util.Utils: Shutdown hook called
> >
> > I don’t have the zookeeper logs from the same period, but the logs are
> full of the these messages
> > 2015-09-24 05:07:42,228 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /10.0.8.4:34705
> > 2015-09-24 05:07:42,229 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old
> client /10.0.8.4:34705; will be dropped if server is in r-o mode
> > 2015-09-24 05:07:42,229 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
> establish new session at /10.0.8.4:34705
> > 2015-09-24 05:07:42,292 [myid:1] - INFO
> [CommitProcessor:1:ZooKeeperServer@617] - Established session
> 0x14ffd3670130030 with negotiated timeout 20001 for client /10.0.8.4:34705
> > 2015-09-24 05:07:42,302 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> > EndOfStreamException: Unable to read additional data from client
> sessionid 0x14ffd3670130030, likely client has closed socket
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >       at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >       at java.lang.Thread.run(Thread.java:745)
> > 2015-09-24 05:07:42,303 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /10.0.8.4:34705 which had sessionid 0x14ffd3670130030
> > 2015-09-24 05:07:42,314 [myid:1] - ERROR
> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
> > java.nio.channels.CancelledKeyException
> >       at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> >       at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
> >       at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
> >       at
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> > 2015-09-24 05:07:42,334 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /10.0.8.4:34707
> > 2015-09-24 05:07:42,334 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822] - Connection request from old
> client /10.0.8.4:34707; will be dropped if server is in r-o mode
> > 2015-09-24 05:07:42,335 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to
> establish new session at /10.0.8.4:34707
> > 2015-09-24 05:07:42,357 [myid:1] - INFO
> [CommitProcessor:1:ZooKeeperServer@617] - Established session
> 0x14ffd3670130031 with negotiated timeout 20001 for client /10.0.8.4:34707
> > 2015-09-24 05:07:42,364 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357] - caught end of stream exception
> > EndOfStreamException: Unable to read additional data from client
> sessionid 0x14ffd3670130031, likely client has closed socket
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> >       at
> org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> >       at java.lang.Thread.run(Thread.java:745)
> > 2015-09-24 05:07:42,365 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /10.0.8.4:34707 which had sessionid 0x14ffd3670130031
> > 2015-09-24 05:07:42,376 [myid:1] - ERROR
> [CommitProcessor:1:NIOServerCnxn@178] - Unexpected Exception:
> > java.nio.channels.CancelledKeyException
> >       at
> sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> >       at
> sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
> >       at
> org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
> >       at
> org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
> >       at
> org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> >
> > Also there are
> > 2015-09-24 06:29:54,459 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139] -
> Shutting down
> > 2015-09-24 06:29:54,459 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441] - shutting
> down
> > 2015-09-24 06:29:54,459 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105] -
> Shutting down
> > 2015-09-24 06:29:54,459 [myid:1] - INFO
> [FollowerRequestProcessor:1:FollowerRequestProcessor@95] -
> FollowerRequestProcessor exited loop!
> > 2015-09-24 06:29:54,460 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - Shutting
> down
> > 2015-09-24 06:29:54,464 [myid:1] - INFO
> [CommitProcessor:1:CommitProcessor@150] - CommitProcessor exited loop!
> > 2015-09-24 06:29:54,465 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] -
> shutdown of request processor complete
> > 2015-09-24 06:29:54,466 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209] -
> Shutting down
> > 2015-09-24 06:29:54,466 [myid:1] - INFO
> [SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor exited!
> > 2015-09-24 06:29:54,466 [myid:1] - INFO
> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714] - LOOKING
> > 2015-09-24 06:29:54,584 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /10.0.8.58:36137
> > 2015-09-24 06:29:54,584 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
> session 0x0 due to java.io.IOException: ZooKeeperServer not running
> > 2015-09-24 06:29:54,584 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /10.0.8.58:36137 (no session established for client)
> > 2015-09-24 06:29:54,679 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket
> connection from /10.0.8.57:57410
> > 2015-09-24 06:29:54,680 [myid:1] - WARN  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362] - Exception causing close of
> session 0x0 due to java.io.IOException: ZooKeeperServer not running
> > 2015-09-24 06:29:54,680 [myid:1] - INFO  [NIOServerCxn.Factory:
> 0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for
> client /10.0.8.57:57410 (no session established for client)
> >
> > I also observed that hadoop-zkfc restarts very frequently.
> > Any ideas what could be wrong?
> >
> > Thanks.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message