##### Site index · List index
Message view
Top
From Fabian Hueske <fhue...@gmail.com>
Subject Re: Optimal Configuration for Cluster
Date Mon, 22 Feb 2016 11:34:52 GMT
```Hi Welly,

I have to correct the formula I posted before:

taskmanager.network.numberOfBuffers: p ^ 2 * t * 4

p is NOT the parallelism of the job, BUT the number of slots of a task
manager.

So if you configure one TM for each machine with 48 slots, you get:
48^2 * 16 * 4 = 147.456 buffers, with 32KB per buffer you need 4.5GB
network memory for each TM, i.e. 4.5GB per machine

If you configure 48 TMs for each machine with 1 slot each, you get:
1^2 * (48*16) * 4 = 3.072 buffers, with 32KB per buffer: 96MB per TM and
4.5GB per machine (with 48 TMs per machine)

Batch transfers are only possible for DataSet (batch) programs.

Hope this helps, Fabian

2016-02-22 11:26 GMT+01:00 Fabian Hueske <fhueske@gmail.com>:

> Hi Welly,
>
> sorry for the late response.
>
> The number of network buffers primarily depends on the maximum parallelism
> The given formula assumes a specific cluster configuration (1 task manager
> per machine, one parallel task per CPU).
> The formula can be translated to:
>
> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>
> where p is the maximum parallelism of the job and t is the number of task
> manager.
> You can process more than one parallel task per TM if you configure more
> The TM will divide its memory among all its slots. So it would be possible
> to start one TM for each machine with 100GB+ memory and 48 slots each.
>
> We can compute the number of network buffers if you give a few more
> - How many task managers do you start? I assume more than one TM per
> machine given that you assign only 4GB of memory out of 128GB to each TM.
> - What is the maximum parallelism of you program?
> - How many processing slots do you configure for each TM?
>
> In general, pipelined shuffles with a high parallelism require a lot of
> memory.
> If you configure batch instead of pipelined transfer, the memory
> requirement goes down
> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>
> Eventually, we want to merge the network buffer and the managed memory
> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
> hopefully disappear at some point in the future.
>
> Best, Fabian
>
> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05041@gmail.com>:
>
>> Hi All,
>>
>> We are trying to running our job in cluster that has this information
>>
>> 1. # of machine: 16
>> 2. memory : 128 gb
>> 3. # of core : 48
>>
>> However when we try to run we have an exception.
>>
>> "insufficient number of network buffers. 48 required but only 10
>> available. the total number of network buffers is currently set to 2048"
>>
>> After looking at the documentation we set configuration based on docs
>>
>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>>
>> However we face another error from JVM
>>
>> java.io.IOException: Cannot allocate network buffer pool: Could not
>> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>>
>> We fiddle the taskmanager.heap.mb: 4096
>>
>> Finally the cluster is running.
>>
>> However i'm still not sure about the configuration and fiddling in task
>> manager heap really fine tune. So my question is
>>
>>
>>    1. Am i doing it right for numberOfBuffers ?
>>    2. How much should we allocate on taskmanager.heap.mb given the
>>    information
>>    3. Any suggestion which configuration we need to set to make it
>>    optimal for the cluster ?
>>    4. Is there any chance that this will get automatically resolve by
>>    memory/network buffer manager ?
>>
>> Thanks a lot for the help
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>

```
Mime
View raw message