flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: S3 recovery and checkpoint directories exhibit explosive growth
Date Wed, 26 Jul 2017 07:48:53 GMT

I think Stephan was talking about removing this part:

	try {
		FileUtils.deletePathIfEmpty(fs, filePath.getParent());
	} catch (Exception ignored) {}

This part should *NOT* be removed:

	fs.delete(filePath, false);

The reason is that the first is only an additional cleanup that, for each deleted file, checks
if the containing directory is now empty and then removes the directory. Checking for empty
directories includes listing all files in the directory, which is expensive in S3 and the
only downside of removing the line are orphaned empty checkpoint directories, which could
be cleaned by a script.

Delete of the file itself must remain because that is how we release files from old checkpoints.


> Am 26.07.2017 um 04:31 schrieb prashantnayak <prashant@intellifylearning.com>:
> Hi Stephan
> Unclear on what you mean by the "trash" option... thought that was only
> available for command line hadoop and not applicable for API, which is what
> Flink uses?  If there is a configuration for the Flink/Hadoop connector,
> please let me know.
> Also, one additional thing about S3.... S3 supports this option of
> "versioned" buckets... Since versioning basically results in a delete on a
> object not actually deleting it (unless there is a bucket lifecycle
> policy)... I think you should recommend that Flink users that rely on S3
> turn off bucket versioning since it seems to not really be a factor for
> Flink...
> Thanks
> Prashant
> --
> View this message in context: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/S3-recovery-and-checkpoint-directories-exhibit-explosive-growth-tp14270p14453.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive at Nabble.com.

View raw message