flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Large number of sources in Flink Job
Date Mon, 28 May 2018 08:48:09 GMT
Hi Chirag,

There have been some issue with very large execution graphs.
You might need to adjust the default configuration and configure larger
Akka buffers and/or timeouts.

Also, 2000 sources means that you run at least 2000 threads at once.

The FileInputFormat (and most of its sub-classes) in Flink 1.5.0 can be
configured to accept multiple directories.
This would be a preferred approach to creating one source per directory.

Best, Fabian

2018-05-28 6:35 GMT+02:00 Chirag Dewan <chirag.dewan22@yahoo.in>:

> Hi,
>
> I am working on a use case where my Flink job needs to collect data from
> thousands of sources.
>
> As an example, I want to collect data from more than 2000 File
> Directories, process(filter, transform) the data and distribute the
> processed data streams to 200 different directories.
>
> Are there any caveats I should know with such large number of sources,
> also taking into account per operator parallelism?
>
> Regards,
>
> Chirag
>
>

Mime
View raw message