flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Optimal Configuration for Cluster
Date Mon, 22 Feb 2016 10:26:19 GMT
Hi Welly,

sorry for the late response.

The number of network buffers primarily depends on the maximum parallelism
of your job.
The given formula assumes a specific cluster configuration (1 task manager
per machine, one parallel task per CPU).
The formula can be translated to:

taskmanager.network.numberOfBuffers: p ^ 2 * t * 4

where p is the maximum parallelism of the job and t is the number of task
You can process more than one parallel task per TM if you configure more
than one processing slot per machine ( taskmanager.numberOfTaskSlots). The
TM will divide its memory among all its slots. So it would be possible to
start one TM for each machine with 100GB+ memory and 48 slots each.

We can compute the number of network buffers if you give a few more details
about your setup:
- How many task managers do you start? I assume more than one TM per
machine given that you assign only 4GB of memory out of 128GB to each TM.
- What is the maximum parallelism of you program?
- How many processing slots do you configure for each TM?

In general, pipelined shuffles with a high parallelism require a lot of
If you configure batch instead of pipelined transfer, the memory
requirement goes down

Eventually, we want to merge the network buffer and the managed memory
pools. So the "taskmanager.network.numberOfBuffers" configuration whill
hopefully disappear at some point in the future.

Best, Fabian

2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05041@gmail.com>:

> Hi All,
> We are trying to running our job in cluster that has this information
> 1. # of machine: 16
> 2. memory : 128 gb
> 3. # of core : 48
> However when we try to run we have an exception.
> "insufficient number of network buffers. 48 required but only 10
> available. the total number of network buffers is currently set to 2048"
> After looking at the documentation we set configuration based on docs
> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
> However we face another error from JVM
> java.io.IOException: Cannot allocate network buffer pool: Could not
> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
> We fiddle the taskmanager.heap.mb: 4096
> Finally the cluster is running.
> However i'm still not sure about the configuration and fiddling in task
> manager heap really fine tune. So my question is
>    1. Am i doing it right for numberOfBuffers ?
>    2. How much should we allocate on taskmanager.heap.mb given the
>    information
>    3. Any suggestion which configuration we need to set to make it
>    optimal for the cluster ?
>    4. Is there any chance that this will get automatically resolve by
>    memory/network buffer manager ?
> Thanks a lot for the help
> Cheers
> --
> Welly Tambunan
> Triplelands
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>

View raw message