flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Richter <s.rich...@data-artisans.com>
Subject Re: Empty directories left over from checkpointing
Date Wed, 20 Sep 2017 12:49:32 GMT

We recently removed some cleanup code, because it involved checking some store meta data to
check when we can delete a directory. For certain stores (like S3), requesting this meta data
whenever we delete a file was so expensive that it could bring down the job because removing
state could not be processed fast enough. We have a temporary fix in place now, so that jobs
at large scale can still run reliably on stores like S3. Currently, this comes at the cost
of not cleaning up directories but we are clearly planning to introduce a different mechanism
for directory cleanup in the future that is not as fine grained as doing meta data queries
per file delete. In the meantime, unfortunately the best way is to cleanup empty directories
with some external tool.


> Am 20.09.2017 um 01:23 schrieb Hao Sun <hasun@zendesk.com>:
> Thanks Elias! Seems like there is no better answer than "do not care about them now",
or delete with a background job.
> On Tue, Sep 19, 2017 at 4:11 PM Elias Levy <fearsome.lucidity@gmail.com <mailto:fearsome.lucidity@gmail.com>>
> There are a couple of related JIRAs:
> https://issues.apache.org/jira/browse/FLINK-7587 <https://issues.apache.org/jira/browse/FLINK-7587>
> https://issues.apache.org/jira/browse/FLINK-7266 <https://issues.apache.org/jira/browse/FLINK-7266>
> On Tue, Sep 19, 2017 at 12:20 PM, Hao Sun <hasun@zendesk.com <mailto:hasun@zendesk.com>>
> Hi, I am using RocksDB and S3 as storage backend for my checkpoints.
> Can flink delete these empty directories automatically? Or I need a background job to
do the deletion?
> I know this has been discussed before, but I could not get a concrete answer for it yet.
> <image.png>

View raw message