flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Should the entire cluster be restarted if a single Task Manager crashes?
Date Fri, 18 Jan 2019 11:00:21 GMT
Hi Harshith,

No, you don't need to restart the whole cluster. Flink only needs enough
processing slots to recover the job.
If you have a standby TM, the job should restart immediately (according to
its restart policy). Otherwise, you have to start a new TM to provide more
slots. Once the slots are registered, the job recovers.

Best,
Fabian

Am Fr., 18. Jan. 2019 um 10:53 Uhr schrieb Kumar Bolar, Harshith <
hkumv@arity.com>:

> Hi all,
>
>
>
> We're running a standalone Flink cluster with 2 Job Managers and 3 Task
> Managers. Whenever a TM crashes, we simply restart that particular TM and
> proceed with the processing.
>
>
>
> But reading the comments on this
> <https://stackoverflow.com/questions/54149134/what-happen-to-state-in-flink-task-manager-when-crash>
question
> makes it look like we need to restart all the 5 nodes that form a cluster
> to deal with the failure of a single TM. Am I reading this right? What
> would be the consequences if we restart just the crashed TM and let the
> healthy ones run as is?
>
>
>
> Thanks,
>
> Harshith
>
>
>

Mime
View raw message