flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Baghino <stefano.bagh...@radicalbit.io>
Subject Re: Issues testing Flink HA w/ ZooKeeper
Date Mon, 15 Feb 2016 12:41:45 GMT
Hi Maximilian,

thank you for the reply. I've checked out the documentation before running
my tests (I'm not expert enough to not read the docs ;)) but it doesn't
mention some specific requirement regarding the execution retries, I'll
check it out, thank!

On Mon, Feb 15, 2016 at 12:51 PM, Maximilian Michels <mxm@apache.org> wrote:

> Hi Stefano,
>
> The Job should stop temporarily but then be resumed by the new
> JobManager. Have you increased the number of execution retries? AFAIK,
> it is set to 0 by default. This will not re-run the job, even in HA
> mode. You can enable it on the StreamExecutionEnvironment.
>
> Otherwise, you have probably already found the documentation:
>
> https://ci.apache.org/projects/flink/flink-docs-master/setup/jobmanager_high_availability.html#configuration
>
> Cheers,
> Max
>
> On Mon, Feb 15, 2016 at 12:35 PM, Stefano Baghino
> <stefano.baghino@radicalbit.io> wrote:
> > Hello everyone,
> >
> > last week I've ran some tests with Apache ZooKeeper to get a grip on
> Flink
> > HA features. My tests went bad so far and I can't sort out the reason.
> >
> > My latest tests involved Flink 0.10.2, ran as a standalone cluster with 3
> > masters and 4 slaves. The 3 masters are also the ZooKeeper (3.4.6)
> ensemble.
> > I've started ZooKeeper on each machine, tested it's availability and then
> > started the Flink cluster. Since there's no reliable distributed
> filesystem
> > on the cluster, I had to use the local file system as the state backend.
> >
> > I then submitted a very simple streaming job that writes the timestamp
> on a
> > text file on the local file system each second and then went on to kill
> the
> > process running the job manager to verify that another job manager takes
> > over. However, the job just stopped. I still have to perform some checks
> on
> > the handover to the new job manager, but before digging deeper I wanted
> to
> > ask if my expectation of having the job going despite the job manager
> > failure is unreasonable.
> >
> > Thanks in advance.
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
>



-- 
BR,
Stefano Baghino

Software Engineer @ Radicalbit

Mime
View raw message