flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Zookeeper failure handling
Date Fri, 22 Sep 2017 11:54:13 GMT
Hi all,

We have observed that in case some nodes of the ZK cluster are restarted
(for a rolling restart) the Flink Streaming jobs fail (and restart).

Log excerpt:

2017-09-22 12:54:41,426 INFO  org.apache.zookeeper.ClientCnxn
                     - Unable to read additional data from server
sessionid 0x15cba6e1a239774, likely server has closed socket, closing
socket connection and attempting reconnect
2017-09-22 12:54:41,527 INFO
org.apache.flink.shaded.org.apache.curator.framework.state.ConnectionStateManager
 - State change: SUSPENDED
2017-09-22 12:54:41,528 WARN
org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService
 - Connection to ZooKeeper suspended. The contender
akka.tcp://flink@splat.sto.midasplayer.com:42118/user/jobmanager no
longer participates in the leader election.
2017-09-22 12:54:41,528 WARN
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
 - Connection to ZooKeeper suspended. Can no longer retrieve the
leader from ZooKeeper.
2017-09-22 12:54:41,528 WARN
org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
 - Connection to ZooKeeper suspended. Can no longer retrieve the
leader from ZooKeeper.
2017-09-22 12:54:41,530 WARN
org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
ZooKeeper connection SUSPENDED. Changes to the submitted job graphs
are not monitored (temporarily).
2017-09-22 12:54:41,530 INFO  org.apache.flink.yarn.YarnJobManager
                     - JobManager
akka://flink/user/jobmanager#-317276879 was revoked leadership.
2017-09-22 12:54:41,532 INFO
org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
event.game.log (2ad7bbcc476bbe3735954fc414ffcb97) switched from state
RUNNING to SUSPENDED.
java.lang.Exception: JobManager is no longer the leader.


Is this the expected behaviour?

Thanks,
Gyula

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message