flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eyal Pe'er <eyal.p...@startapp.com>
Subject Fault tolerance in Flink file Sink
Date Thu, 23 Apr 2020 10:11:48 GMT
Hi all,
I am using Flink streaming with Kafka consumer connector (FlinkKafkaConsumer) and file Sink
(StreamingFileSink) in a cluster mode with exactly once policy.
The file sink writes the files to the local disk.
I've noticed that if a job fails and automatic restart is on, the task managers look for the
leftovers files from the last failing job (hidden files).
Obviously, since the tasks can be assigned to different task managers, this sums up to more
failures over and over again.
The only solution I found so far is to delete the hidden files and resubmit the job.
If I get it right (and please correct me If I wrong), the events in the hidden files were
not committed to the bootstrap-server, so there is no data loss.

Is there a way, forcing Flink to ignore the files that were written already? Or maybe there
is a better way to implement the solution (maybe somehow with savepoints)?

Best regards
Eyal Peer

View raw message