flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Welly Tambunan <if05...@gmail.com>
Subject Re: Optimal Configuration for Cluster
Date Tue, 23 Feb 2016 10:57:21 GMT
Hi Ufuk and Fabian,

Is that better to start 48 task manager ( one slot each ) in one machine
than having single task manager with 48 slot ? Any trade-off that we should
know etc ?

Cheers

On Tue, Feb 23, 2016 at 3:03 PM, Welly Tambunan <if05041@gmail.com> wrote:

> Hi Ufuk,
>
> Thanks for the explanation.
>
> Yes. Our jobs is all streaming job.
>
> Cheers
>
> On Tue, Feb 23, 2016 at 2:48 PM, Ufuk Celebi <uce@apache.org> wrote:
>
>> The new default is equivalent to the previous "streaming mode". The
>> community decided to get rid of this distinction, because it was
>> confusing to users.
>>
>> The difference between "streaming mode" and "batch mode" was how
>> Flink's managed memory was allocated, either lazily when required
>> ('streaming mode") or eagerly on task manager start up ("batch mode").
>> Now it's lazy by default.
>>
>> This is not something you need to worry about, but if you are mostly
>> using the DataSet API where pre allocation has benefits, you can get
>> the "batch mode" behaviour by using the following configuration key:
>>
>> taskmanager.memory.preallocate: true
>>
>> But you are using the DataStream API anyways, right?
>>
>> – Ufuk
>>
>>
>> On Tue, Feb 23, 2016 at 6:36 AM, Welly Tambunan <if05041@gmail.com>
>> wrote:
>> > Hi Fabian,
>> >
>> > Previously when using flink 0.9-0.10 we start the cluster with streaming
>> > mode or batch mode. I see that this one is gone on Flink 1.00 snapshot
>> ? So
>> > this one has already taken care of the flink and optimize by runtime >
>> >
>> > On Mon, Feb 22, 2016 at 5:26 PM, Fabian Hueske <fhueske@gmail.com>
>> wrote:
>> >>
>> >> Hi Welly,
>> >>
>> >> sorry for the late response.
>> >>
>> >> The number of network buffers primarily depends on the maximum
>> parallelism
>> >> of your job.
>> >> The given formula assumes a specific cluster configuration (1 task
>> manager
>> >> per machine, one parallel task per CPU).
>> >> The formula can be translated to:
>> >>
>> >> taskmanager.network.numberOfBuffers: p ^ 2 * t * 4
>> >>
>> >> where p is the maximum parallelism of the job and t is the number of
>> task
>> >> manager.
>> >> You can process more than one parallel task per TM if you configure
>> more
>> >> than one processing slot per machine ( taskmanager.numberOfTaskSlots).
>> The
>> >> TM will divide its memory among all its slots. So it would be possible
>> to
>> >> start one TM for each machine with 100GB+ memory and 48 slots each.
>> >>
>> >> We can compute the number of network buffers if you give a few more
>> >> details about your setup:
>> >> - How many task managers do you start? I assume more than one TM per
>> >> machine given that you assign only 4GB of memory out of 128GB to each
>> TM.
>> >> - What is the maximum parallelism of you program?
>> >> - How many processing slots do you configure for each TM?
>> >>
>> >> In general, pipelined shuffles with a high parallelism require a lot of
>> >> memory.
>> >> If you configure batch instead of pipelined transfer, the memory
>> >> requirement goes down
>> >> (ExecutionConfig.setExecutionMode(ExecutionMode.BATCH)).
>> >>
>> >> Eventually, we want to merge the network buffer and the managed memory
>> >> pools. So the "taskmanager.network.numberOfBuffers" configuration whill
>> >> hopefully disappear at some point in the future.
>> >>
>> >> Best, Fabian
>> >>
>> >> 2016-02-19 9:34 GMT+01:00 Welly Tambunan <if05041@gmail.com>:
>> >>>
>> >>> Hi All,
>> >>>
>> >>> We are trying to running our job in cluster that has this information
>> >>>
>> >>> 1. # of machine: 16
>> >>> 2. memory : 128 gb
>> >>> 3. # of core : 48
>> >>>
>> >>> However when we try to run we have an exception.
>> >>>
>> >>> "insufficient number of network buffers. 48 required but only 10
>> >>> available. the total number of network buffers is currently set to
>> 2048"
>> >>>
>> >>> After looking at the documentation we set configuration based on docs
>> >>>
>> >>> taskmanager.network.numberOfBuffers: # core ^ 2 * # machine * 4
>> >>>
>> >>> However we face another error from JVM
>> >>>
>> >>> java.io.IOException: Cannot allocate network buffer pool: Could not
>> >>> allocate enough memory segments for NetworkBufferPool (required (Mb):
>> 2304,
>> >>> allocated (Mb): 698, missing (Mb): 1606). Cause: Java heap space
>> >>>
>> >>> We fiddle the taskmanager.heap.mb: 4096
>> >>>
>> >>> Finally the cluster is running.
>> >>>
>> >>> However i'm still not sure about the configuration and fiddling in
>> task
>> >>> manager heap really fine tune. So my question is
>> >>>
>> >>> Am i doing it right for numberOfBuffers ?
>> >>> How much should we allocate on taskmanager.heap.mb given the
>> information
>> >>> Any suggestion which configuration we need to set to make it optimal
>> for
>> >>> the cluster ?
>> >>> Is there any chance that this will get automatically resolve by
>> >>> memory/network buffer manager ?
>> >>>
>> >>> Thanks a lot for the help
>> >>>
>> >>> Cheers
>> >>>
>> >>> --
>> >>> Welly Tambunan
>> >>> Triplelands
>> >>>
>> >>> http://weltam.wordpress.com
>> >>> http://www.triplelands.com
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Welly Tambunan
>> > Triplelands
>> >
>> > http://weltam.wordpress.com
>> > http://www.triplelands.com
>>
>
>
>
> --
> Welly Tambunan
> Triplelands
>
> http://weltam.wordpress.com
> http://www.triplelands.com <http://www.triplelands.com/blog/>
>



-- 
Welly Tambunan
Triplelands

http://weltam.wordpress.com
http://www.triplelands.com <http://www.triplelands.com/blog/>

Mime
View raw message