flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shannon Carey <sca...@expedia.com>
Subject failure-rate restart strategy not working?
Date Thu, 05 Jan 2017 19:50:53 GMT
I recently updated my cluster with the following config:

restart-strategy: failure-rate
restart-strategy.failure-rate.max-failures-per-interval: 3
restart-strategy.failure-rate.failure-rate-interval: 5 min
restart-strategy.failure-rate.delay: 10 s

I see the settings inside the JobManager web UI, as expected. I am not setting the restart-strategy
programmatically, but the job does have checkpointing enabled.

However, if I launch a job that (intentionally) fails every 10 seconds by throwing a RuntimeException,
it continues to restart beyond the limit of 3 failures.

Does anyone know why this might be happening? Any ideas of things I could check?

View raw message