flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kostas Kloudas <k.klou...@data-artisans.com>
Subject Re: ContinuousFileMonitoringFunction - deleting file after processing
Date Tue, 18 Oct 2016 09:05:26 GMT
Hi Maciek,

Currently this functionality is not supported but this seems like a good addition.
Actually, give that the feature is rather new, we were thinking of opening a discussion 
in the dev mailing list in order to 

i) discuss some current limitations of the Continuous File Processing source
ii) see how people use it and adjust our features accordingly

I will let you know as soon as I open this thread.

By the way for your use-case, we should probably have a callback in the notifyCheckpointComplete()
that will inform the source that a given checkpoint was successfully performed and then 
we can purge the already processed files. This can be a good solution.

Thanks,
Kostas

> On Oct 18, 2016, at 9:40 AM, Maciek Próchniak <mpr@touk.pl> wrote:
> 
> Hi,
> 
> we want to monitor hdfs (or local) directory, read csv files that appear and after successful
processing - delete them (mainly not to run out of disk space...)
> 
> I'm not quite sure how to achieve it with current implementation. Previously, when we
read binary data (unsplittable files) we made small hack and deleted them
> 
> in our FileInputFormat - but now we want to use splits and detecting which split is 'the
last one' is no longer so obvious - of course it's also problematic when it comes to checkpointing...
> 
> So my question is - is there a idiomatic way of deleting processed files?
> 
> 
> thanks,
> 
> maciek
> 


Mime
View raw message