flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gwenhael Pasquiers <gwenhael.pasqui...@ericsson.com>
Subject RE: Distribution of sinks among the nodes
Date Thu, 04 Feb 2016 13:37:26 GMT
Sorry I was confused about the number of slots, it’s good now.

However, is disableChaing or disableOperatorChaining working properly ?
I called those methods everywhere I could but it still seems that some of my operators are
being chained together I can’t go over 16 used slot where I should be at 24 if there was
no chaining …

From: Gwenhael Pasquiers [mailto:gwenhael.pasquiers@ericsson.com]
Sent: jeudi 4 février 2016 09:55
To: user@flink.apache.org
Subject: RE: Distribution of sinks among the nodes

Don’t we need to set the number of slots to 24 (4 sources + 16 mappers + 4 sinks) ?

Or is there a way not to set the number of slots per TaskManager instead of globally so that
they are at least equally dispatched among the nodes ?

As for the sink deployment : that’s not good news ; I mean we will have a non-negligible
overhead : all the data generated by 3 of the 4 nodes will be sent to a third node instead
of being sent to the “local” sink. Network I/O have a price.

Do you have some sort of “topology” feature coming in the roadmap ? Maybe a listener on
the JobManager / env that would be trigerred, asking usk on which node we would prefer each
node to be deployed. That way you keep the standard behavior, don’t have to make a complicated
generic-optimized algorithm, and let the user make it’s choices. Should I create a JIRA

For the time being we could start the application 4 time : one time per node, put that’s
not pretty at all ☺


From: Till Rohrmann [mailto:trohrmann@apache.org]
Sent: mercredi 3 février 2016 17:58
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Distribution of sinks among the nodes

Hi Gwenhäel,

if you set the number of slots for each TaskManager to 4, then all of your mapper will be
evenly spread out. The sources should also be evenly spread out. However, for the sinks since
they depend on all mappers, it will be most likely random where they are deployed. So you
might end up with 4 sink tasks on one machine.


On Wed, Feb 3, 2016 at 4:31 PM, Gwenhael Pasquiers <gwenhael.pasquiers@ericsson.com<mailto:gwenhael.pasquiers@ericsson.com>>
It is one type of mapper with a parallelism of 16
It's the same for the sinks and sources (parallelism of 4)

The settings are
Mapper.setPrallelism(env.getParallelism() * 4)

We mean to have X mapper tasks per source / sink

The mapper is doing some heavy computation and we have only 4 kafka partitions. That's why
we need more mappers than sources / sinks

-----Original Message-----
From: Aljoscha Krettek [mailto:aljoscha@apache.org<mailto:aljoscha@apache.org>]
Sent: mercredi 3 février 2016 16:26
To: user@flink.apache.org<mailto:user@flink.apache.org>
Subject: Re: Distribution of sinks among the nodes

Hi Gwenhäel,
when you say 16 maps, are we talking about one mapper with parallelism 16 or 16 unique map

> On 03 Feb 2016, at 15:48, Gwenhael Pasquiers <gwenhael.pasquiers@ericsson.com<mailto:gwenhael.pasquiers@ericsson.com>>
> Hi,
> We try to deploy an application with the following “architecture” :
> 4 kafka sources => 16 maps => 4 kafka sinks, on 4 nodes, with 24 slots (we disabled
operator chaining).
> So we’d like on each node :
> 1x source => 4x map => 1x sink
> That way there are no exchanges between different instances of flink and performances
would be optimal.
> But we get (according to the flink GUI and the Host column when looking at the details
of each task) :
> Node 1 : 1 source =>  2 map
> Node 2 : 1 source =>  1 map
> Node 3 : 1 source =>  1 map
> Node 4 : 1 source =>  12 maps => 4 sinks
> (I think no comments are needed J)
> The the Web UI says that there are 24 slots and they are all used but they don’t seem
evenly dispatched …
> How could we make Flink deploy the tasks the way we want ?
> B.R.
> Gwen’

View raw message