flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Siew Wai Yow <wai_...@hotmail.com>
Subject Re: What happen to state in Flink Task Manager when crash?
Date Mon, 14 Jan 2019 14:57:46 GMT
Thanks Dawid and Qiu!
Both of you clear all my doubts, perfect!

From: Dawid Wysakowicz <dwysakowicz@apache.org>
Sent: Monday, January 14, 2019 9:26 PM
To: Congxian Qiu; Siew Wai Yow
Cc: Jamie Grier; user@flink.apache.org
Subject: Re: What happen to state in Flink Task Manager when crash?


Pretty much just a rephrase of what others said. Flink's state is usually backed some highly
available distributed fs and upon checkpoint a consistent view of all local states is written
there, so it can be later restored from. As of now, any failure of a Task slot (e.g. if a
TM fails, all slots in that TM fail) will result in a job restart. If the remaining TMs have
enough slots to restart the job it will be restored onto them. The restoration always starts
with the checkpoint as an "entry point". That means all the states written there will be resdistributed
to the TMs. With task-local recovery feature [1] flink will try to distribute the state/tasks
so that the local snapshot can be reused. Hope that this clears things up.



[1] https://ci.apache.org/projects/flink/flink-docs-stable/ops/state/large_state_tuning.html#task-local-recovery

On 12/01/2019 05:46, Congxian Qiu wrote:
Hi, Siew Wai Yow

When the job is running, the states are stored in the local RocksDB, Flink will copy all the
needed states to checkpointPath when doing a Checkpoint.
If there have any failures, the job will be restored from the last previously *Successfully*
checkpoint, and assign the restored states to all the current TM
(These TMs do not need to be the same as before) .

Siew Wai Yow <wai_yow@hotmail.com<mailto:wai_yow@hotmail.com>> 于2019年1月12日周六
Thanks. But this is something I know. I would like to know will the other TM take over the
crashed TM's state to ensure data completion(say the state BYKEY, different key state will
be stored in different TM) OR the crashed TM need to be recovered to continue?

For example, 5 records,

TM1 stored state for rec1:KEYA, rec3:KEYA
TM2 stored state for rec2:KEYB, rec5:KEYB <--crashed, what happen to those state stored
TM3 stored state for rec4:KEYC

In case TM2 crashed, rec2 and rec5 will be assigned to other TM? Or those record only recover
when TM2 being recover?


From: Jamie Grier <jgrier@lyft.com<mailto:jgrier@lyft.com>>
Sent: Saturday, January 12, 2019 2:26 AM
To: Siew Wai Yow
Cc: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: What happen to state in Flink Task Manager when crash?

Flink is designed such that local state is backed up to a highly available system such as
HDFS or S3.  When a TaskManager fails state is recovered from there.

I suggest reading this:  https://ci.apache.org/projects/flink/flink-docs-release-1.7/ops/state/state_backends.html

On Fri, Jan 11, 2019 at 7:33 AM Siew Wai Yow <wai_yow@hotmail.com<mailto:wai_yow@hotmail.com>>

May i know what happen to state stored in Flink Task Manager when this Task manager crash.
Say the state storage is rocksdb, would those data transfer to other running Task Manager
so that complete state data is ready for data processing?



View raw message