flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: keep-alive job strategy
Date Mon, 09 Oct 2017 07:01:53 GMT
Hi Rob,

yes, this behavior is expected. Flink does not automatically scale-down a
job in case of a failure.
You have to ensure that you have enough resources available to continue
processing.
In case of Flink's cluster mode, the common practice is to have stand-by
TMs available (the same is true for JMs if you need a HA setup).

Best, Fabian


2017-10-06 13:56 GMT+02:00 r. r. <robert@abv.bg>:

> Hello
> I have set up a cluster and added taskmanagers manually with
> bin/taskmanager.sh start.
> I noticed that if i have 5 task managers with one slot each and start a
> job with -p5, then if i stop a taskmanager the job will fail even if there
> are 4 more taskmanagers.
>
> Is this expected (I turned off restart policy)?
> So the way to ensure continuous operation of a single "job" is to have
> e.g. 10 TM and deploy 10 job instances to fill each of 10 slots?
> Or if I have a job that does require -p3 for example, I should always have
> at least 3 TM alive?
>
> Many thanks!
> -Rob
>
>

Mime
View raw message