zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Re: Unstable work of zookeeper
Date Thu, 24 Sep 2015 13:36:07 GMT
I can see that the client is disconnecting from the server, and there is also a new round of
leader election for the zookeeper servers. If this is happening, then yeah, your ensemble
is unstable. If the ensemble leader election is being triggered frequently, then I'd start
by looking there. Try to determine why the ensemble is failing to continue with the same leader.
If ensemble elections aren't happening frequently, then another possibility is that GC pauses
are causing the session to expire.

-Flavio

> On 24 Sep 2015, at 05:24, Akmal Abbasov <akmal.abbasov@icloud.com> wrote:
> 
> Hi,
> I am using zookeeper 3.4.6
> I have a spark cluster configured with HA. Once per 1-2 days, the active spark master
is shutting down with a message
> 15/09/23 18:58:18 INFO zookeeper.ClientCnxn: Unable to read additional data from server
sessionid 0x34ffa68dbd10021, likely server has closed socket, closing socket connection and
attempting reconnect
> 15/09/23 18:58:18 INFO state.ConnectionStateManager: State change: SUSPENDED
> 15/09/23 18:58:18 INFO master.ZooKeeperLeaderElectionAgent: We have lost leadership
> 15/09/23 18:58:18 ERROR master.Master: Leadership has been revoked -- master shutting
down.
> 15/09/23 18:58:18 INFO util.Utils: Shutdown hook called
> 
> I don’t have the zookeeper logs from the same period, but the logs are full of the
these messages 
> 2015-09-24 05:07:42,228 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.0.8.4:34705
> 2015-09-24 05:07:42,229 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822]
- Connection request from old client /10.0.8.4:34705; will be dropped if server is in r-o
mode
> 2015-09-24 05:07:42,229 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868]
- Client attempting to establish new session at /10.0.8.4:34705
> 2015-09-24 05:07:42,292 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established
session 0x14ffd3670130030 with negotiated timeout 20001 for client /10.0.8.4:34705
> 2015-09-24 05:07:42,302 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357]
- caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 0x14ffd3670130030,
likely client has closed socket
> 	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> 	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-09-24 05:07:42,303 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.0.8.4:34705 which had sessionid 0x14ffd3670130030
> 2015-09-24 05:07:42,314 [myid:1] - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected
Exception:
> java.nio.channels.CancelledKeyException
> 	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> 	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> 	at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
> 	at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
> 	at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
> 	at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> 2015-09-24 05:07:42,334 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.0.8.4:34707
> 2015-09-24 05:07:42,334 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@822]
- Connection request from old client /10.0.8.4:34707; will be dropped if server is in r-o
mode
> 2015-09-24 05:07:42,335 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868]
- Client attempting to establish new session at /10.0.8.4:34707
> 2015-09-24 05:07:42,357 [myid:1] - INFO  [CommitProcessor:1:ZooKeeperServer@617] - Established
session 0x14ffd3670130031 with negotiated timeout 20001 for client /10.0.8.4:34707
> 2015-09-24 05:07:42,364 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@357]
- caught end of stream exception
> EndOfStreamException: Unable to read additional data from client sessionid 0x14ffd3670130031,
likely client has closed socket
> 	at org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:228)
> 	at org.apache.zookeeper.server.NIOServerCnxnFactory.run(NIOServerCnxnFactory.java:208)
> 	at java.lang.Thread.run(Thread.java:745)
> 2015-09-24 05:07:42,365 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.0.8.4:34707 which had sessionid 0x14ffd3670130031
> 2015-09-24 05:07:42,376 [myid:1] - ERROR [CommitProcessor:1:NIOServerCnxn@178] - Unexpected
Exception:
> java.nio.channels.CancelledKeyException
> 	at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:73)
> 	at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:77)
> 	at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.java:151)
> 	at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.java:1081)
> 	at org.apache.zookeeper.server.FinalRequestProcessor.processRequest(FinalRequestProcessor.java:404)
> 	at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
> 
> Also there are 
> 2015-09-24 06:29:54,459 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerZooKeeperServer@139]
- Shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@441]
- shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FollowerRequestProcessor@105]
- Shutting down
> 2015-09-24 06:29:54,459 [myid:1] - INFO  [FollowerRequestProcessor:1:FollowerRequestProcessor@95]
- FollowerRequestProcessor exited loop!
> 2015-09-24 06:29:54,460 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181]
- Shutting down
> 2015-09-24 06:29:54,464 [myid:1] - INFO  [CommitProcessor:1:CommitProcessor@150] - CommitProcessor
exited loop!
> 2015-09-24 06:29:54,465 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415]
- shutdown of request processor complete
> 2015-09-24 06:29:54,466 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@209]
- Shutting down
> 2015-09-24 06:29:54,466 [myid:1] - INFO  [SyncThread:1:SyncRequestProcessor@187] - SyncRequestProcessor
exited!
> 2015-09-24 06:29:54,466 [myid:1] - INFO  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:QuorumPeer@714]
- LOOKING
> 2015-09-24 06:29:54,584 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.0.8.58:36137
> 2015-09-24 06:29:54,584 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362]
- Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-09-24 06:29:54,584 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.0.8.58:36137 (no session established for client)
> 2015-09-24 06:29:54,679 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
- Accepted socket connection from /10.0.8.57:57410
> 2015-09-24 06:29:54,680 [myid:1] - WARN  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@362]
- Exception causing close of session 0x0 due to java.io.IOException: ZooKeeperServer not running
> 2015-09-24 06:29:54,680 [myid:1] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007]
- Closed socket connection for client /10.0.8.57:57410 (no session established for client)
> 
> I also observed that hadoop-zkfc restarts very frequently.
> Any ideas what could be wrong?
> 
> Thanks.


Mime
View raw message