flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Issues testing Flink HA w/ ZooKeeper
Date Mon, 15 Feb 2016 12:52:08 GMT

> On 15 Feb 2016, at 13:40, Stefano Baghino <stefano.baghino@radicalbit.io> wrote:
> Hi Ufuk, thanks for replying. 
> Regarding the masters file: yes, I've specified all the masters and checked out that
they were actually running after the start-cluster.sh. I'll gladly share the logs as soon
as I get to see them.
> Regarding the state backend: how does having a non-distributed storage as the state backend
influence the HA features? I thought it would have meant that the job state couldn't be restored
but the job itself could've been started after the backup job manager started. Does not having
a reliable distributed storage service as the state backend mean that the HA features don't

No, the submitted job is also stored in the state backend and it is recovered from there.
ZooKeeper has a pointer to the state handle of the configured backend. Since all job managers
run on the same host it should work as you expected. The requirement is that all job managers
need to be able to access the state backend.

Recovery of a job manager failure is actually independent of the execution retries right now.

I think as soon as we have a look at the logs, we will figure it out. ;)

– Ufuk

View raw message