flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dinesh J <dineshj...@gmail.com>
Subject Issue with single job yarn flink cluster HA
Date Sun, 22 Mar 2020 07:55:38 GMT
Hi all,
We have single job yarn flink cluster setup with High Availability.
Sometimes job manager failure successfully restarts next attempt from
current checkpoint.
But occasionally we are getting below error.

{"errors":["Service temporarily unavailable due to an ongoing leader
election. Please refresh."]}

Hadoop version using : Hadoop 2.7.1.2.4.0.0-169

Flink version: flink-1.7.2

Zookeeper version: 3.4.6-169--1


*Below is the flink configuration*

high-availability: zookeeper

high-availability.zookeeper.quorum: host1:2181,host2:2181,host3:2181

high-availability.storageDir: hdfs:///flink/ha

high-availability.zookeeper.path.root: /flink

yarn.application-attempts: 10

state.backend: rocksdb

state.checkpoints.dir: hdfs:///flink/checkpoint

state.savepoints.dir: hdfs:///flink/savepoint

jobmanager.execution.failover-strategy: region

restart-strategy: failure-rate

restart-strategy.failure-rate.max-failures-per-interval: 3

restart-strategy.failure-rate.failure-rate-interval: 5 min

restart-strategy.failure-rate.delay: 10 s



Can someone let know if I am missing something or is it a known issue?

Is it something related to hostname ip mapping issue or zookeeper version issue?

Thanks,

Dinesh

Mime
View raw message