flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <min....@ubs.com>
Subject Externalised checkpoint keeps ValueState after a crash of a Flink cluster
Date Mon, 04 Mar 2019 20:16:18 GMT
Hi,

I have a question about  to keep a ValueState after a Flink 1.7.2 cluster is crashed.

My Flink job is simple

1) read dummy events (an event only has a string Id) from a Kafka source.
2) do a count on input events and save it as a ValueState
3) setup an externalized checkpoint running every second
4) fire a number of events (e.g. 2K with four threads in parallel) and kill the task manager
or job manager manually, (kill processId) before the inputs are completed
5) restart with the check point in a local file directory

when the task manager gets killed and its recovery task manager restarts, the check point
works well, i.e. the second picks up events left from the first and produce a correct total
count of events (e.g. 8K).

When the job manager gets killed and its recovery job manager restarts, the check point does
not work well. The second still picks up events left from the first BUT it does not produce
a correct total count of events (e.g. 8K minus the count done before the crash).

The command to start the recovery job is  bin/flink run -s file:///Users/min/Applications/flink1.7.2/log/c449fad6fbaff3daad6bc526b8a74d18

Any idea about where I could have done incorrectly?

I expect an externalized checkpoint would keep any Value State after a crash of the Flink
cluster.

Thank you very much for your help in advance.

Regards,

Min
Mime
View raw message