flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gerardg <ger...@talaia.io>
Subject Re: Missing checkpoint when restarting failed job
Date Tue, 21 Nov 2017 14:16:38 GMT
> where exactly did you read many times that incremental checkpoints cannot
reference files from previous 
> checkpoints, because we would have to correct that information. In fact,
> this is how incremental checkpoints work. 

My fault, I read it in some other posts in the mailing list but now that I
read it carefully it meant savepoints not checkpoints.

> Now for this case, I would consider it extremely unlikely that a
> checkpoint 1620 would still reference a checkpoint 1,
> in particular if the files for that checkpoint are already deleted, which
> should only happen if it is no longer
> referenced. Which version of Flink are you using and what is your
> distributed filesystem? Is there any way to
> reproduce the problem? 

We are using Flink version 1.3.2 and GlusterFS.  There are usually a few
checkpoints around at the same time, for example right now: 

chk-1  chk-26  chk-27  chk-28  chk-29  chk-30  chk-31

I'm not sure how to reproduce the problem but I'll monitor the folder to see
when chk-1 gets deleted and try to make the task fail when that happens.

Gerard

Gerard




--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message