flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephan Ewen <se...@apache.org>
Subject Re: S3 recovery and checkpoint directories exhibit explosive growth
Date Mon, 24 Jul 2017 18:12:08 GMT
Hi Prashant!

Flink's S3 integration currently goes through Hadoop's S3 file system (as
you probably noticed).

It seems that the Hadoop's S3 file system is not really well suited for
what we want to do, and we are looking to drop it and replace it by
something direct (independent of Hadoop) in the coming release...

One essential thing to make sure is to not have the "trash" activated in
the configuration, as it adds very high overhead to the delete operations.


On Mon, Jul 24, 2017 at 7:56 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Prashant!
> I assume you are using Flink 1.3.0 or 1.3.1?
> Here are some things you can do:
>   - I would try and disable the incremental checkpointing for a start and
> see what happens then. That should reduce the number of files already.
>   - Is it possible for you to run a patched version of Flink? If yes, can
> you try to do the following: In the class "FileStateHandle", in the method
> "discardState()", remove the code around "FileUtils.deletePathIfEmpty(...)"
> - this is probably not working well when hitting too many S3 files.
>   -  You can delete old "completedCheckpointXXXYYY" files, but please do
> not delete the other two types, they are needed for HA recovery.
> Greetings,
> Stephan
> On Mon, Jul 24, 2017 at 3:46 AM, prashantnayak <
> prashant@intellifylearning.com> wrote:
>> Hi Xiaogang and Stephan
>> We're continuing to test and have now set up the cluster to disable
>> incremental RocksDB checkpointing as well as increasing the checkpoint
>> interval from 30s to 120s  (not ideal really :-( )
>> We'll run it with a large number of jobs and report back if this setup
>> shows
>> improvement.
>> Appreciate any another insights you might have around this problem.
>> Thanks
>> Prashant
>> --
>> View this message in context: http://apache-flink-user-maili
>> ng-list-archive.2336050.n4.nabble.com/S3-recovery-and-che
>> ckpoint-directories-exhibit-explosive-growth-tp14270p14392.html
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive at Nabble.com.

View raw message