Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
References: <1439329285784-2373.post@n4.nabble.com>
 <CAKADb_MJ5Nc286Of0rui28OnMda71G+qp7YC5XrfQ1t92-XaFQ@mail.gmail.com>
 <CAKiyyaEmZkf56ZH35HdysEam6OshiEtKiv1sd+wqoKfXaRyLHw@mail.gmail.com>
 <CAKADb_Pu1mY+4X4VpScXuqYLAG=JCUYKrS7qddWAyp6ppLTrQA@mail.gmail.com>
 <CAKiyyaEr_FxB80KQaP5T-SXW=tt1WC2WPuD2c0YDv8htPyczFQ@mail.gmail.com>
 <55CB247B.9080706@students.uni-marburg.de>
In-Reply-To: <55CB247B.9080706@students.uni-marburg.de>
From: Aljoscha Krettek <aljoscha@apache.org>
Date: Thu, 13 Aug 2015 08:03:21 +0000
Message-ID: 
 <CANMXwW2Bbq7wWCO__-UVs4JgU+W8wH4M3CXdcFuRghkrex3WsQ@mail.gmail.com>
Subject: Re: Forward Partitioning & same Parallelism: 1:1 communication?
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=089e010d84005f4df5051d2cc541

--089e010d84005f4df5051d2cc541
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Hi,
I looked into it. Right now, when the specified partitioner is "FORWARD"
the JobGraph that is generated from the StreamGraph will have the
POINT-TO-POINT pattern specified. This doesn't work, however, if the
parallelism differs so the operators will not have a POINT-TO-POINT
connection in the end. This results in the "REBALANCE" behavior your
observed.

My PR makes it explicit which connection pattern will be used. It will also
properly set the connection to "REBALANCE" if not partitioning is specified
and the parallelism of the operators is different.

I hope this helps somehow, let us know if you have any other questions.

Cheers,
Aljoscha

On Wed, 12 Aug 2015 at 11:49 Anneke Walter <Walteran@students.uni-marburg.d=
e>
wrote:

> Hey M=C3=A1rton, hey Ufuk,
>
> thank you for your replies, that was very helpful!
>
> I now have an additional question based on M=C3=A1rton's answer to Ufuk's
> question (by the way, I'm currently working only with the streaming API, =
so
> I am most interested in answers concerning streaming than batch
> processing... :-) )
>
> In the second link M=C3=A1rton provided [1] it says:
>
> "This was not very transparent: When you went from low parallelism to hig=
h
> dop some downstream operators would never get any input."
>
> A question below the pull request then asks "If a non parallel source is
> used does the user need to call rebalance to use all parallel instances
> of the downstream operator?" and I don't think that question was explicit=
ly
> answered. The closest thing to an explicit answer is "[so far] forward wa=
s
> assumed. This was valid for a change of parallelism, which led to either
> the degenerative case of only one downstream instance receiving elements =
(1
> to n parallelism)".
>
> To me, that sounds as if up until right now, in a situation where operato=
r
> A has lower parallelism than the following downstream operator B (for
> example, source A with parallelism 1 and filter B with parallelism 4), no=
t
> all instances of B would receive output from A if forward partitioning is
> used.
>
> Now, in the docs [2] it says:
> * "Forward (default)*: Forward partitioning directs the output data to
> the next operator on the same machine (if possible) avoiding expensive
> network I/O. _If there are more processing nodes than inputs or vice vers=
a
> the load is distributed among the extra nodes in a round-robin fashion_.
> This is the default partitioner."
>
> So far, I would've thought that the middle sentence describes that when
> forward partitioning is used when the parallelism differs, outputs will b=
e
> forwarded to the next operator on the same machine where possible, but al=
so
> distributing some outputs to the extra nodes with round-robin. However,
> I've tested the setup describes above (see below) and it seems that Flink
> uses "normal" round-robin partitioning (rebalance partitioning) when the
> parallelism differs - using round-robin for _all_ outputs, not doing any
> "forwarding" (in the forward partitioning sense). Is that correct?
>
> My little test: 1 Sink, 4 Filters
> I tried that with Flink 0.9 and, even though I did not explicitly specify
> any partitioning (so the default, forward, should have been used), Flink
> apparently uses rebalance partitioning in this case - from the log:
> DEBUG StreamingJobGraphGenerator:235 Thread-1 - Parallelism set: 4 for 2
> DEBUG StreamingJobGraphGenerator:235 Thread-1 - Parallelism set: 1 for 1
> DEBUG StreamingJobGraphGenerator:312 Thread-1 - CONNECTED:
> RebalancePartitioner - 1 -> 2
>
> Same thing happened reversely when I went from 4 filters (B) to 1 sink C,
> rebalancing was apparently used.
>
> So that one problem (concerning downstream operators not receiving output=
s
> when forward partitioning is used) described in the pull request is
> apparently already fixed in 0.9 - or does it only work correctly for the
> source/sink connection and not between other operators (I did not have ti=
me
> to try more scenarios)?
>
> Again, I would be very happy about some input about if I grasped Flink's
> behavior correctly! :-) Thanks in advance!
> Nica
>
> [1] https://github.com/apache/flink/pull/988
> [2] https://ci.apache.org/projects/flink/flink-docs-release-0.9/apis/stre=
aming_guide.html#partitioning
>
>
> Am 12.08.2015 um 11:50 schrieb Ufuk Celebi:
>
> Thanks :) Regarding your answer to Nica: I didn't mean to say that it was
> too generic or anything... it was very nice. I was just curious, that's w=
hy
> I asked.
>
> On Wed, Aug 12, 2015 at 11:45 AM, M=C3=A1rton Balassi <balassi.marton@gma=
il.com> <balassi.marton@gmail.com>
> wrote:
>
>
> Hey Ufuk,
>
> The shipping strategy name forward is shared between batch and streaming
> and Nica did not specify either API, so I tried to give a generic answer.
>
> I assume that your question is specifically for streaming, in that case:
> Yes, streaming is using the pointwise distribution pattern. [1]
> Unfortunately your concern is true, currently streaming would leave extra
> downstream operator instances idle, but Aljoscha has an open pull request
> fixing this issue amongst others. See the discussion here. [2]
>
> [1]https://github.com/apache/flink/blob/master/flink-staging/flink-stream=
ing/flink-streaming-core/src/main/java/org/apache/flink/streaming/api/graph=
/StreamingJobGraphGenerator.java#L320
> [2] https://github.com/apache/flink/pull/988
>
> Cheers,
>
> Marton
>
> On Wed, Aug 12, 2015 at 11:33 AM, Ufuk Celebi <ufuk@data-artisans.com> <u=
fuk@data-artisans.com>
> wrote:
>
>
> Hey Marton,
>
> out of curiosity: is this using Flink=E2=80=99s =E2=80=9Cpoint=E2=80=9D c=
onnections underneath or
> is there some custom logic for streaming jobs?
>
> What happens if operator B has 2 times the parallelism of operator A? For
> example if there were parallel tasks A1 and A2 and B1-B4: would A1 send t=
o
> B1 *and* B2 or just B1?
>
> =E2=80=93 Ufuk
>
> On 12 Aug 2015, at 10:39, M=C3=A1rton Balassi <balassi.marton@gmail.com> =
<balassi.marton@gmail.com>
> wrote:
>
> Dear Nica,
>
> Yes, forward partitioning means that if subsequent operators share
> parallelism then the output of an upstream operator is sent to exactly
> one downstream operator. This makes sense for operators working on
> individual records, e.g. a typical map-filter pair, because as a
> consequence Flink may be able to collocate these operator pairs on the sa=
me
> physical machine.
>
> Best,
>
> Marton
>
> On Tue, Aug 11, 2015 at 11:41 PM, Nicaz <Walteran@students.uni-marburg.de
>
> wrote:
>
> Hello,
>
> I have a question about forward partitioning in Flink.
>
> If Operator A and Operator B have the same parallelism set and forward
> partitioning is used for events coming from instances of A and going to
> instances of B:
>
> Will each instance of A send events to _exactly one_ instance of B?
>
> That is, will all events coming from a specific instance of A go to the
> _same_ specific instance of B, and will _all_ instances of B be used?
> Or are there any situations where an instance of A will distribute events
> to
> several different instances of B, or where two instances of A will send
> events to the same instance of B (possibly leaving some other instance of
> B
> unused)?
>
> I'd be very happy if someone were able to shed some light on this issue.
> :-)
>
> Thanks in advance
> Nica
>
>
>
> --
> View this message in context:http://apache-flink-user-mailing-list-archiv=
e.2336050.n4.nabble.com/Forward-Partitioning-same-Parallelism-1-1-communica=
tion-tp2373.html
> Sent from the Apache Flink User Mailing List archive. mailing list
> archive at Nabble.com.
>
>
>
>
>
>

--089e010d84005f4df5051d2cc541
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi,<div>I looked into it. Right now, when the specified pa=
rtitioner is &quot;FORWARD&quot; the JobGraph that is generated from the St=
reamGraph will have the POINT-TO-POINT pattern specified. This doesn&#39;t =
work, however, if the parallelism differs so the operators will not have a =
POINT-TO-POINT connection in the end. This results in the &quot;REBALANCE&q=
uot; behavior your observed.</div><div><br></div><div>My PR makes it explic=
it which connection pattern will be used. It will also properly set the con=
nection to &quot;REBALANCE&quot; if not partitioning is specified and the p=
arallelism of the operators is different.</div><div><br></div><div>I hope t=
his helps somehow, let us know if you have any other questions.=C2=A0</div>=
<div><br></div><div>Cheers,</div><div>Aljoscha</div></div><br><div class=3D=
"gmail_quote"><div dir=3D"ltr">On Wed, 12 Aug 2015 at 11:49 Anneke Walter &=
lt;<a href=3D"mailto:Walteran@students.uni-marburg.de">Walteran@students.un=
i-marburg.de</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" styl=
e=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
 =20
   =20
 =20
  <div bgcolor=3D"#FFFFFF" text=3D"#000000">
    Hey M=C3=A1rton, hey Ufuk,<br>
    <br>
    thank you for your replies, that was very helpful!<br>
    <br>
    I now have an additional question based on M=C3=A1rton&#39;s answer to =
Ufuk&#39;s
    question (by the way, I&#39;m currently working only with the streaming
    API, so I am most interested in answers concerning streaming than
    batch processing... :-) )<br>
    <br>
    In the second link M=C3=A1rton provided [1] it says:<br>
    <br>
    &quot;This was not very
    transparent: When you went from low parallelism to high dop some
    downstream operators would never get any input.&quot;<br>
    <br>
    A question below the pull request then asks &quot;If a non parallel
    source is used does the user need to call <code>rebalance</code> to
    use all parallel instances of the downstream operator?&quot; and I don&=
#39;t
    think that question was explicitly answered. The closest thing to an
    explicit answer is &quot;[so far] forward was assumed. This was valid f=
or
    a change of parallelism, which led to either the degenerative case
    of only one downstream instance receiving elements (1 to n
    parallelism)&quot;.<br>
    <br>
    To me, that sounds as if up until right now, in a situation where
    operator A has lower parallelism than the following downstream
    operator B (for example, source A with parallelism 1 and filter B
    with parallelism 4), not all instances of B would receive output
    from A if forward partitioning is used.<br>
    <br>
    Now, in the docs [2] it says:<em><br>
      &quot;Forward (default)</em>: Forward partitioning directs the output
    data to the next operator on the same machine (if possible) avoiding
    expensive network I/O. _If there are more processing nodes than
    inputs or vice versa the load is distributed among the extra nodes
    in a round-robin fashion_. This is the default partitioner.&quot;<br>
    <br>
    So far, I would&#39;ve thought that the middle sentence describes that
    when forward partitioning is used when the parallelism differs,
    outputs will be forwarded to the next operator on the same machine
    where possible, but also distributing some outputs to the extra
    nodes with round-robin. However, I&#39;ve tested the setup describes
    above (see below) and it seems that Flink uses &quot;normal&quot; round=
-robin
    partitioning (rebalance partitioning) when the parallelism differs -
    using round-robin for _all_ outputs, not doing any &quot;forwarding&quo=
t; (in
    the forward partitioning sense). Is that correct?<br>
    <br>
    My little test: 1 Sink, 4 Filters<br>
    I tried that with Flink 0.9 and, even though I did not explicitly
    specify any partitioning (so the default, forward, should have been
    used), Flink apparently uses rebalance partitioning in this case -
    from the log:<br>
    DEBUG StreamingJobGraphGenerator:235 Thread-1 - Parallelism set: 4
    for 2<br>
    DEBUG StreamingJobGraphGenerator:235 Thread-1 - Parallelism set: 1
    for 1<br>
    DEBUG StreamingJobGraphGenerator:312 Thread-1 - CONNECTED:
    RebalancePartitioner - 1 -&gt; 2<br>
    <br>
    Same thing happened reversely when I went from 4 filters (B) to 1
    sink C, rebalancing was apparently used.<br>
    <br>
    So that one problem (concerning downstream operators not receiving
    outputs when forward partitioning is used) described in the pull
    request is apparently already fixed in 0.9 - or does it only work
    correctly for the source/sink connection and not between other
    operators (I did not have time to try more scenarios)? <br>
    <br>
    Again, I would be very happy about some input about if I grasped
    Flink&#39;s behavior correctly! :-) Thanks in advance!<br>
    Nica<br>
    <br>
    <code></code>
    <pre>[1] <a href=3D"https://github.com/apache/flink/pull/988" target=3D=
"_blank">https://github.com/apache/flink/pull/988</a>
[2] <a href=3D"https://ci.apache.org/projects/flink/flink-docs-release-0.9/=
apis/streaming_guide.html#partitioning" target=3D"_blank">https://ci.apache=
.org/projects/flink/flink-docs-release-0.9/apis/streaming_guide.html#partit=
ioning</a>
</pre></div><div bgcolor=3D"#FFFFFF" text=3D"#000000">
    <br>
    <div>Am 12.08.2015 um 11:50 schrieb Ufuk
      Celebi:<br>
    </div>
    <blockquote type=3D"cite">
      <pre>Thanks :) Regarding your answer to Nica: I didn&#39;t mean to sa=
y that it was
too generic or anything... it was very nice. I was just curious, that&#39;s=
 why
I asked.

On Wed, Aug 12, 2015 at 11:45 AM, M=C3=A1rton Balassi <a href=3D"mailto:bal=
assi.marton@gmail.com" target=3D"_blank">&lt;balassi.marton@gmail.com&gt;</=
a>
wrote:

</pre>
      <blockquote type=3D"cite">
        <pre>Hey Ufuk,

The shipping strategy name forward is shared between batch and streaming
and Nica did not specify either API, so I tried to give a generic answer.

I assume that your question is specifically for streaming, in that case:
Yes, streaming is using the pointwise distribution pattern. [1]
Unfortunately your concern is true, currently streaming would leave extra
downstream operator instances idle, but Aljoscha has an open pull request
fixing this issue amongst others. See the discussion here. [2]

[1]
<a href=3D"https://github.com/apache/flink/blob/master/flink-staging/flink-=
streaming/flink-streaming-core/src/main/java/org/apache/flink/streaming/api=
/graph/StreamingJobGraphGenerator.java#L320" target=3D"_blank">https://gith=
ub.com/apache/flink/blob/master/flink-staging/flink-streaming/flink-streami=
ng-core/src/main/java/org/apache/flink/streaming/api/graph/StreamingJobGrap=
hGenerator.java#L320</a>
[2] <a href=3D"https://github.com/apache/flink/pull/988" target=3D"_blank">=
https://github.com/apache/flink/pull/988</a>

Cheers,

Marton

On Wed, Aug 12, 2015 at 11:33 AM, Ufuk Celebi <a href=3D"mailto:ufuk@data-a=
rtisans.com" target=3D"_blank">&lt;ufuk@data-artisans.com&gt;</a>
wrote:

</pre>
        <blockquote type=3D"cite">
          <pre>Hey Marton,

out of curiosity: is this using Flink=E2=80=99s =E2=80=9Cpoint=E2=80=9D con=
nections underneath or
is there some custom logic for streaming jobs?

What happens if operator B has 2 times the parallelism of operator A? For
example if there were parallel tasks A1 and A2 and B1-B4: would A1 send to
B1 *and* B2 or just B1?

=E2=80=93 Ufuk

On 12 Aug 2015, at 10:39, M=C3=A1rton Balassi <a href=3D"mailto:balassi.mar=
ton@gmail.com" target=3D"_blank">&lt;balassi.marton@gmail.com&gt;</a>
wrote:

Dear Nica,

Yes, forward partitioning means that if subsequent operators share
parallelism then the output of an upstream operator is sent to exactly
one downstream operator. This makes sense for operators working on
individual records, e.g. a typical map-filter pair, because as a
consequence Flink may be able to collocate these operator pairs on the same
physical machine.

Best,

Marton

On Tue, Aug 11, 2015 at 11:41 PM, Nicaz &lt;<a href=3D"mailto:Walteran@stud=
ents.uni-marburg.de" target=3D"_blank">Walteran@students.uni-marburg.de</a>
</pre>
          <blockquote type=3D"cite">
            <pre>wrote:
</pre>
          </blockquote>
          <pre>Hello,

I have a question about forward partitioning in Flink.

If Operator A and Operator B have the same parallelism set and forward
partitioning is used for events coming from instances of A and going to
instances of B:

Will each instance of A send events to _exactly one_ instance of B?

That is, will all events coming from a specific instance of A go to the
_same_ specific instance of B, and will _all_ instances of B be used?
Or are there any situations where an instance of A will distribute events
to
several different instances of B, or where two instances of A will send
events to the same instance of B (possibly leaving some other instance of
B
unused)?

I&#39;d be very happy if someone were able to shed some light on this issue=
.
:-)

Thanks in advance
Nica


--
View this message in context:
<a href=3D"http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.=
com/Forward-Partitioning-same-Parallelism-1-1-communication-tp2373.html" ta=
rget=3D"_blank">http://apache-flink-user-mailing-list-archive.2336050.n4.na=
bble.com/Forward-Partitioning-same-Parallelism-1-1-communication-tp2373.htm=
l</a>
Sent from the Apache Flink User Mailing List archive. mailing list
archive at Nabble.com.


</pre>
        </blockquote>
        <pre>
</pre>
      </blockquote>
      <pre>
</pre>
    </blockquote>
    <br>
  </div></blockquote></div>

--089e010d84005f4df5051d2cc541--