flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <balassi.mar...@gmail.com>
Subject Re: Forward Partitioning & same Parallelism: 1:1 communication?
Date Wed, 12 Aug 2015 09:45:15 GMT
Hey Ufuk,

The shipping strategy name forward is shared between batch and streaming
and Nica did not specify either API, so I tried to give a generic answer.

I assume that your question is specifically for streaming, in that case:
Yes, streaming is using the pointwise distribution pattern. [1]
Unfortunately your concern is true, currently streaming would leave extra
downstream operator instances idle, but Aljoscha has an open pull request
fixing this issue amongst others. See the discussion here. [2]

[1]
https://github.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGraphGenerator.java#L320
[2] https://github.com/apache/flink/pull/988

Cheers,

Marton

On Wed, Aug 12, 2015 at 11:33 AM, Ufuk Celebi <ufuk@data-artisans.com>
wrote:

> Hey Marton,
>
> out of curiosity: is this using Flink’s “point” connections underneath or
> is there some custom logic for streaming jobs?
>
> What happens if operator B has 2 times the parallelism of operator A? For
> example if there were parallel tasks A1 and A2 and B1-B4: would A1 send to
> B1 *and* B2 or just B1?
>
> – Ufuk
>
> On 12 Aug 2015, at 10:39, Márton Balassi <balassi.marton@gmail.com> wrote:
>
> Dear Nica,
>
> Yes, forward partitioning means that if subsequent operators share
> parallelism then the output of an upstream operator is sent to exactly
> one downstream operator. This makes sense for operators working on
> individual records, e.g. a typical map-filter pair, because as a
> consequence Flink may be able to collocate these operator pairs on the same
> physical machine.
>
> Best,
>
> Marton
>
> On Tue, Aug 11, 2015 at 11:41 PM, Nicaz <Walteran@students.uni-marburg.de
> > wrote:
> Hello,
>
> I have a question about forward partitioning in Flink.
>
> If Operator A and Operator B have the same parallelism set and forward
> partitioning is used for events coming from instances of A and going to
> instances of B:
>
> Will each instance of A send events to _exactly one_ instance of B?
>
> That is, will all events coming from a specific instance of A go to the
> _same_ specific instance of B, and will _all_ instances of B be used?
> Or are there any situations where an instance of A will distribute events
> to
> several different instances of B, or where two instances of A will send
> events to the same instance of B (possibly leaving some other instance of B
> unused)?
>
> I'd be very happy if someone were able to shed some light on this issue.
> :-)
>
> Thanks in advance
> Nica
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Forward-Partitioning-same-Parallelism-1-1-communication-tp2373.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>
>
>

Mime
View raw message