flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yassine MARZOUGUI <y.marzou...@mindlytix.com>
Subject Re: Behaviour of the BucketingSink when checkpoints fail
Date Fri, 28 Apr 2017 13:53:29 GMT
Hi Aljoscha,

Thank you for your response. I guess then I will manually rename the
pending files. Does this however mean that the BucketingSink is not
exactly-once as it is described is the docs, since in this case (failure of
the job and failure of checkpoints) there will be duplicates? Or am I
missing something in the notion of exactly-once guarantees?


2017-04-28 15:47 GMT+02:00 Aljoscha Krettek <aljoscha@apache.org>:

> Hi,
> Yes, your analysis is correct. The pending files are not recognised as
> such because they were never in any checkpointed state that could be
> restored. I’m afraid it’s not possible to build the sink state just from
> the files existing in the output folder. The reason we have state in the
> first place is so that we can figure out what each of the files in the
> output folder are.
> Maybe you could manually move the pending files that you know are correct
> to “final”?
> Best,
> Aljoscha
> On 28. Apr 2017, at 11:22, Yassine MARZOUGUI <y.marzougui@mindlytix.com>
> wrote:
> Hi all,
> I'm have a failed job containing a BucketingSink. The last successful
> checkpoint was before the source started emitting data. The following
> checkpoints all failed due to the long timeout as I mentioned here :
> http://apache-flink-user-mailing-list-archive.2336050.
> n4.nabble.com/Checkpoints-very-slow-with-high-backpressure-td12762.html.
> The Taskmanager has then failed. Upon recovery, the pending fies did not
> move to finished state.
> Is that because the sink was not able to checkpoint to list of pending
> files?
> Is it possible to build the sink state just from the output folder and the
> suffixes of the files?
> Thanks,
> Yassine

View raw message