flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: S3 recovery and checkpoint directories exhibit explosive growth
Date Wed, 26 Jul 2017 16:17:36 GMT
Hi,

your concerns about deleting files when using incremental checkpoints is very valid. Deleting
empty checkpoint folders is obviously ok. As for files, I have recently added some additional
logging to the checkpointing mechanism to report the files referenced in the last checkpoint.
I will try to also include the logging in 1.3.2.  Based on this, you could make safe assumptions
about which files are actually orphaned. I am even considering packing this list as a plain
text file with the checkpoint, to make this more transparent for users.

Best,
Stefan

> Am 26.07.2017 um 16:57 schrieb prashantnayak <prashant@intellifylearning.com>:
> 
> Thanks Stephan and Stefan
> 
> We're looking forward to this patch in 1.3.2
> 
> We will use a patched version depending upon when 1.3.2 is going to be
> available.
> 
> We're also implementing a cron job to remove orphaned/older
> completedCheckpoint files per your recommendations..  one caveat with a job
> like that is that we have to check if a particular job is
> stopped/paused/down and also if the Job Manager is down so we don't
> accidentally remove valid checkpoint files..   this makes it a bit dicey....
> ideal of course is not to have to do this. 
> 
> The move away from hadoop/s3 would be welcome as well.
> 
> Flink job state is critical to us since we have very long running jobs
> (months) processing hundreds of millions of records.  
> 
> Thanks
> Prashant
> 
> 
> 
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14477.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.


Mime
View raw message