flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gyula Fóra <gyula.f...@gmail.com>
Subject Re: Zookeeper failure handling
Date Fri, 22 Sep 2017 16:41:42 GMT
We are using 1.3.2

Gyula

On Fri, Sep 22, 2017, 17:13 Ted Yu <yuzhihong@gmail.com> wrote:

> Which release are you using ?
>
> Flink 1.3.2 uses Curator 2.12.0 which solves some leader election issues.
>
> Mind giving 1.3.2 a try ?
>
> On Fri, Sep 22, 2017 at 4:54 AM, Gyula Fóra <gyula.fora@gmail.com> wrote:
>
> > Hi all,
> >
> > We have observed that in case some nodes of the ZK cluster are restarted
> > (for a rolling restart) the Flink Streaming jobs fail (and restart).
> >
> > Log excerpt:
> >
> > 2017-09-22 12:54:41,426 INFO  org.apache.zookeeper.ClientCnxn
> >                      - Unable to read additional data from server
> > sessionid 0x15cba6e1a239774, likely server has closed socket, closing
> > socket connection and attempting reconnect
> > 2017-09-22 12:54:41,527 INFO
> > org.apache.flink.shaded.org.apache.curator.framework.
> > state.ConnectionStateManager
> >  - State change: SUSPENDED
> > 2017-09-22 12:54:41,528 WARN
> > org.apache.flink.runtime.leaderelection.ZooKeeperLeaderElectionService
> >  - Connection to ZooKeeper suspended. The contender
> > akka.tcp://flink@splat.sto.midasplayer.com:42118/user/jobmanager no
> > longer participates in the leader election.
> > 2017-09-22 12:54:41,528 WARN
> > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
> >  - Connection to ZooKeeper suspended. Can no longer retrieve the
> > leader from ZooKeeper.
> > 2017-09-22 12:54:41,528 WARN
> > org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
> >  - Connection to ZooKeeper suspended. Can no longer retrieve the
> > leader from ZooKeeper.
> > 2017-09-22 12:54:41,530 WARN
> > org.apache.flink.runtime.jobmanager.ZooKeeperSubmittedJobGraphStore  -
> > ZooKeeper connection SUSPENDED. Changes to the submitted job graphs
> > are not monitored (temporarily).
> > 2017-09-22 12:54:41,530 INFO  org.apache.flink.yarn.YarnJobManager
> >                      - JobManager
> > akka://flink/user/jobmanager#-317276879 was revoked leadership.
> > 2017-09-22 12:54:41,532 INFO
> > org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job
> > event.game.log (2ad7bbcc476bbe3735954fc414ffcb97) switched from state
> > RUNNING to SUSPENDED.
> > java.lang.Exception: JobManager is no longer the leader.
> >
> >
> > Is this the expected behaviour?
> >
> > Thanks,
> > Gyula
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message