flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ufuk Celebi <...@apache.org>
Subject Re: Great number of jobs and numberOfBuffers
Date Thu, 17 Aug 2017 09:23:50 GMT
Hey Gwenhael,

the network buffers are recycled automatically after a job terminates.
If this does not happen, it would be quite a major bug.

To help debug this:

- Which version of Flink are you using?
- Does the job fail immediately after submission or later during execution?
- Is the following correct: the batch job that eventually fails
because of missing network buffers runs without problems if you submit
it to a fresh cluster with the same memory

The network buffers are recycled after the task managers report the
task being finished. If you immediately submit the next batch there is
a slight chance that the buffers are not recycled yet. As a possible
temporary work around, could you try waiting for a short amount of
time before submitting the next batch?

I think we should also be able to run the job without splitting it up
after increasing the network memory configuration. Did you already try
this?

Best,

Ufuk


On Thu, Aug 17, 2017 at 10:38 AM, Gwenhael Pasquiers
<gwenhael.pasquiers@ericsson.com> wrote:
> Hello,
>
>
>
> We’re meeting a limit with the numberOfBuffers.
>
>
>
> In a quite complex job we do a lot of operations, with a lot of operators,
> on a lot of folders (datehours).
>
>
>
> In order to split the job into smaller “batches” (to limit the necessary
> “numberOfBuffers”) I’ve done a loop over the batches (handle the datehours 3
> by 3), for each batch I create a new env then call the execute() method.
>
>
>
> However it looks like there is no cleanup : after a while, if the number of
> batches is too big, there is an error saying that the numberOfBuffers isn’t
> high enough. It kinds of looks like some leak. Is there a way to clean them
> up ?

Mime
View raw message