Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAAdrtT19vZ-c3KGr0kbgLiwYj3bh1XmAGeEFdBNwmgv-Omj1Cw@mail.gmail.com>
References: 
 <CABGNe=aG+hQLesNtocQ0FMdn42A=b+QiRaV33k3q6KgwmM7CoA@mail.gmail.com>
	<CAAdrtT19vZ-c3KGr0kbgLiwYj3bh1XmAGeEFdBNwmgv-Omj1Cw@mail.gmail.com>
Date: Tue, 23 Feb 2016 12:36:09 +0700
Message-ID: 
 <CABGNe=aG1CutV-ZWEjAB2BKx=4faH7UdPW1W23jz7XDwTZ1PGg@mail.gmail.com>
Subject: Re: Optimal Configuration for Cluster
From: Welly Tambunan <if05041@gmail.com>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a114157c69600d1052c6953b9

--001a114157c69600d1052c6953b9
Content-Type: text/plain; charset=UTF-8

Hi Fabian,

Previously when using flink 0.9-0.10 we start the cluster with streaming
mode or batch mode. I see that this one is gone on Flink 1.00 snapshot ? So
this one has already taken care of the flink and optimize by runtime >

On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi Welly,
>
> sorry for the late response.
>
> The number of network buffers primarily depends on the maximum parallelism
> of your job.
> The given formula assumes a specific cluster configuration (1 task manager
> per machine, one parallel task per CPU).
> The formula can be translated to:
>
> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>
> where p is the maximum parallelism of the job and t is the number of task
> manager.
> You can process more than one parallel task per TM if you configure more
> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
> The TM will divide its memory among all its slots. So it would be possible
> to start one TM for each machine with 100GB+ memory and 48 slots each.
>
> We can compute the number of network buffers if you give a few more
> details about your setup:
> - How many task managers do you start? I assume more than one TM per
> machine given that you assign only 4GB of memory out of 128GB to each TM.
> - What is the maximum parallelism of you program?
> - How many processing slots do you configure for each TM?
>
> In general, pipelined shuffles with a high parallelism require a lot of
> memory.
> If you configure batch instead of pipelined transfer, the memory
> requirement goes down
> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>
> Eventually, we want to merge the network buffer and the managed memory
> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
> hopefully disappear at some point in the future.
>
> Best, Fabian
>
> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05041@gmail.com>:
>
>> Hi All,
>>
>> We are trying to running our job in cluster that has this information
>>
>> 1. # of machine: 16
>> 2. memory : 128 gb
>> 3. # of core : 48
>>
>> However when we try to run we have an exception.
>>
>> "insufficient number of network buffers. 48 required but only 10
>> available. the total number of network buffers is currently set to 2048"
>>
>> After looking at the documentation we set configuration based on docs
>>
>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>>
>> However we face another error from JVM
>>
>> java.io.IOException: Cannot allocate network buffer pool: Could not
>> allocate enough memory segments for NetworkBufferPool (required (Mb): 2304,
>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>>
>> We fiddle the taskmanager.heap.mb: 4096
>>
>> Finally the cluster is running.
>>
>> However i'm still not sure about the configuration and fiddling in task
>> manager heap really fine tune. So my question is
>>
>>
>>    1. Am i doing it right for numberOfBuffers ?
>>    2. How much should we allocate on taskmanager.heap.mb given the
>>    information
>>    3. Any suggestion which configuration we need to set to make it
>>    optimal for the cluster ?
>>    4. Is there any chance that this will get automatically resolve by
>>    memory/network buffer manager ?
>>
>> Thanks a lot for the help
>>
>> Cheers
>>
>> --
>> Welly Tambunan
>> Triplelands
>>
>> http://weltam.wordpress.com
>> http://www.triplelands.com <http://www.triplelands.com/blog/>
>>
>
>


-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

--001a114157c69600d1052c6953b9
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Fabian,=C2=A0<div><br></div><div>Previously when using =
flink 0.9-0.10 we start the cluster with streaming mode or batch mode. I se=
e that this one is gone on Flink 1.00 snapshot ? So this one has already ta=
ken care of the flink and optimize by runtime &gt;</div></div><div class=3D=
"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Feb 22, 2016 at 5:26 P=
M, Fabian Hueske <span dir=3D"ltr">&lt;<a href=3D"mailto:fhueske@gmail.com"=
 target=3D"_blank">fhueske@gmail.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex"><div dir=3D"ltr"><div><div><div><div><div><div><div><div>=
<div><div>Hi Welly,<br><br></div>sorry for the late response.<br><br></div>=
The number of network buffers primarily depends on the maximum parallelism =
of your job. <br></div><div>The given formula assumes a specific cluster co=
nfiguration (1 task manager per machine, one parallel task per CPU).<br></d=
iv><div>The formula can be translated to:<br><br><div>taskmanager.network.n=
umberOfBuffers: p ^ 2 * t * 4 <br></div><div><br>where p is the maximum par=
allelism of the job and t is the number of task manager.<br></div></div>You=
 can process more than one parallel task per TM if you configure more than =
one processing slot per machine (
    <code>taskmanager.numberOfTaskSlots)</code>. The TM will divide its mem=
ory among all its slots. So it would be possible to start one TM for each m=
achine with 100GB+ memory and 48 slots each.<br></div><div><br></div>We can=
 compute the number of network buffers if you give a few more details about=
 your setup:<br></div>- How many task managers do you start? I assume more =
than one TM per machine given that you assign only 4GB of memory out of 128=
GB to each TM.<br></div></div><div>- What is the maximum parallelism of you=
 program?<br>- How many processing slots do you configure for each TM?<br><=
/div><div><br></div>In general, pipelined shuffles with a high parallelism =
require a lot of memory. <br></div>If you configure batch instead of pipeli=
ned transfer, the memory requirement goes down (ExecutionConfig.setExecutio=
nMode(ExecutionMode.BATCH)).<br><br></div>Eventually, we want to merge the =
network buffer and the managed memory pools. So the &quot;taskmanager.netwo=
rk.numberOfBuffers&quot; configuration whill hopefully disappear at some po=
int in the future.<br><br></div>Best, Fabian<br></div><div class=3D"HOEnZb"=
><div class=3D"h5"><div class=3D"gmail_extra"><br><div class=3D"gmail_quote=
">2016-02-19 9:34 GMT+01:00 Welly Tambunan <span dir=3D"ltr">&lt;<a href=3D=
"mailto:if05041@gmail.com" target=3D"_blank">if05041@gmail.com</a>&gt;</spa=
n>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><span style=3D"font-=
size:12.8px">Hi All,=C2=A0</span><div style=3D"font-size:12.8px"><br></div>=
<div style=3D"font-size:12.8px">We are trying to running our job in cluster=
 that has this information</div><div style=3D"font-size:12.8px"><br></div><=
div style=3D"font-size:12.8px"><div><div>1. # of machine: 16=C2=A0<br>2. me=
mory : 128 gb=C2=A0</div><div>3. # of core : 48=C2=A0</div></div><div><br><=
/div><div>However when we try to run we have an exception.=C2=A0<br><br>&qu=
ot;insufficient number of network buffers. 48 required but only 10 availabl=
e. the total number of network buffers is currently set to 2048&quot;<br></=
div><div><br></div><div>After looking at the documentation we set configura=
tion based on docs</div><div><br></div><div>taskmanager.network.numberOfBuf=
fers: # core ^ 2 * # machine * 4=C2=A0</div><div><br></div><div>However we =
face another error from JVM<br><br>java.io.IOException: Cannot allocate net=
work buffer pool: Could not allocate enough memory segments for NetworkBuff=
erPool (required (Mb): 2304, allocated (Mb): 698, missing (Mb): 1606). Caus=
e: Java heap space<br><br>We fiddle the=C2=A0taskmanager.heap.mb:=C2=A04096=
</div><div><br></div><div>Finally the cluster is running.=C2=A0</div><div><=
br></div><div>However i&#39;m still not sure about the configuration and fi=
ddling in task manager heap really fine tune. So my question is</div><div><=
br></div><div><ol><li style=3D"margin-left:15px">Am i doing it right for nu=
mberOfBuffers ?</li><li style=3D"margin-left:15px">How much should we alloc=
ate on taskmanager.heap.mb given the information</li><li style=3D"margin-le=
ft:15px">Any suggestion which configuration we need to set to make it optim=
al for the cluster ?=C2=A0</li><li style=3D"margin-left:15px">Is there any =
chance that this will get automatically resolve by memory/network buffer ma=
nager ?</li></ol><div>Thanks a lot for the help</div></div><div><br></div><=
div>Cheers</div></div><span><font color=3D"#888888"><div><br></div>-- <br><=
div>Welly Tambunan<br>Triplelands=C2=A0<br><br><a href=3D"http://weltam.wor=
dpress.com" target=3D"_blank">http://weltam.wordpress.com</a><div><a href=
=3D"http://www.triplelands.com/blog/" target=3D"_blank">http://www.triplela=
nds.com</a></div></div>
</font></span></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><br clear=3D"all"><div><br></div>-- <br>=
<div class=3D"gmail_signature">Welly Tambunan<br>Triplelands=C2=A0<br><br><=
a href=3D"http://weltam.wordpress.com" target=3D"_blank">http://weltam.word=
press.com</a><div><a href=3D"http://www.triplelands.com/blog/" target=3D"_b=
lank">http://www.triplelands.com</a></div></div>
</div>

--001a114157c69600d1052c6953b9--