Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
Sender: ewenstephan@gmail.com
In-Reply-To: 
 <CAELUF_B3_YAe_jUj1uszsZBif5FwZbP9BTFWpMKRYUXeSV4d=w@mail.gmail.com>
References: 
 <CAELUF_C2ydMrMtnvRa8uinyRBLtc4FZB8p-Bc8ZO-pFGqMWWkg@mail.gmail.com>
	<CANC1h_toJXDgSw0bzvj50a0wvK+Xzexpwu0MyfU8jT06C66FiQ@mail.gmail.com>
	<CAELUF_B3_YAe_jUj1uszsZBif5FwZbP9BTFWpMKRYUXeSV4d=w@mail.gmail.com>
Date: Tue, 8 Sep 2015 11:22:09 +0200
Message-ID: 
 <CANC1h_sVKvd5K9VFec5HiSd8e2zJ4-E5S82cY0KYbfg_iPj7Qg@mail.gmail.com>
Subject: Re: Case of possible join optimization
From: Stephan Ewen <sewen@apache.org>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a11407dba87b6b3051f38e651

--001a11407dba87b6b3051f38e651
Content-Type: text/plain; charset=UTF-8

The problem is the "getInput2()" call. It takes the input to the join, not
the result of the join. That way, the first join never happens.

On Tue, Sep 8, 2015 at 11:10 AM, Flavio Pompermaier <pompermaier@okkam.it>
wrote:

> Obviously when trying to simplify my code I didn't substitute correctly
> the variable of the join..it should be:
>
> DataSet<Tuple3<String, List<MyObject>, List<ThriftObj>>> atomSubset =
>       attrToExpand.join(*subset*
> ).where(0).equalTo(0).projectFirst(0,1).projectSecond(1);
>
> Do you think that a JoinHint to create a sort-merge join is equivalent to
> my solution?
>
>
> On Tue, Sep 8, 2015 at 10:45 AM, Stephan Ewen <sewen@apache.org> wrote:
>
>> Hi Flavio!
>>
>> No, Flink does not join keys before full values. That is very often very
>> inefficient, as it results effectively in two joins where one is typically
>> about as expensive as the original join.
>>
>> One can do "semi-join-reduction", in case the join filters out many
>> values (many elements from one side do not find a match in the other side).
>> If the join does not filter, this does not help either.
>>
>> Your code is a bit of a surprise. Especially, because in you solution
>> that worked, the first statement does nothing:
>>
>> DataSet<Tuple2<String, List<ThriftObj>>> subset =
>>
>> attrToExpand.project(0).joinWithHuge(bigDataset).where(0).equalTo(0).getInput2();
>>
>>
>> This builds a join, but then takes the second input of the join (the
>> bigDataset data set). Because the result of the join is never
>> actually used, it is never executed. The second statement hence is
>> effectively
>>
>> DataSet<Tuple3<String, List<MyObject>, List<ThriftObj>>> atomSubset =
>>
>> attrToExpand.join(bigDataset).where(0).equalTo(0).projectFirst(0,1).projectSecond(1);
>>
>>
>> Curious why this executed when the original did not.
>>
>> BTW: If the Lists are very long so they do not fit into a hashtable
>> memory partition, you can try to use a JoinHint to create a sort-merge
>> join. It may become slower, but typically works with even less memory.
>>
>>
>> Greetings,
>> Stephan
>>
>>
>> On Tue, Sep 8, 2015 at 9:59 AM, Flavio Pompermaier <pompermaier@okkam.it>
>> wrote:
>>
>>> Hi to all,
>>>
>>> I have a case where I don't understand why flink is not able to optimize
>>> the join between 2 datasets.
>>>
>>> My initial code was basically this:
>>>
>>> DataSet<Tuple2<String, List<ThriftObj>>> bigDataset = ...;//5.257.207
>>> elements
>>> DataSet<Tuple2<String,List<MyObject>>> attrToExpand =
>>> ...;//65.000 elements
>>>
>>> DataSet<Tuple2<String, IndexAttributeToExpand>> tmp =
>>>
>>> attrToExpand.joinWithHuge(subset).where(0).equalTo(0).projectFirst(0,1).projectSecond(1);
>>>
>>> This job wasn't able to complete on my local machine (from Eclipse)
>>> because Flink was giving me the following error:
>>>
>>> Hash join exceeded maximum number of recursions, without reducing
>>> partitions enough to be memory resident. Probably cause: Too many duplicate
>>> keys.
>>>
>>> This was because in attrToExpand the List<MyObject> could be quite big.
>>> Indeed, changing that code to the following make everything work like a
>>> charm:
>>>
>>> DataSet<Tuple2<String, List<ThriftObj>>> subset =
>>>
>>> attrToExpand.project(0).joinWithHuge(bigDataset).where(0).equalTo(0).getInput2();
>>>
>>> DataSet<Tuple3<String, List<MyObject>, List<ThriftObj>>> atomSubset =
>>>
>>> attrToExpand.join(subset).where(0).equalTo(0).projectFirst(0,1).projectSecond(1);
>>>
>>>
>>> Isn't something impossible for Flink to optimize my initial code into
>>> the second? I was convinced that Flink was performing a join only on the
>>> keys before grabbing also the other elements of the Tuples into memory..am
>>> I wrong?
>>>
>>> Best,
>>> Flavio
>>>
>>
>>
>
>

--001a11407dba87b6b3051f38e651
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The problem is the &quot;getInput2()&quot; call. It takes =
the input to the join, not the result of the join. That way, the first join=
 never happens.</div><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Tue, Sep 8, 2015 at 11:10 AM, Flavio Pompermaier <span dir=3D"ltr">&=
lt;<a href=3D"mailto:pompermaier@okkam.it" target=3D"_blank">pompermaier@ok=
kam.it</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"=
margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"=
ltr">Obviously when trying to simplify my code I didn&#39;t substitute corr=
ectly the variable of the join..it should be:<span class=3D""><br><br>DataS=
et&lt;Tuple3&lt;String, List&lt;MyObject&gt;, List&lt;ThriftObj&gt;&gt;&gt;=
 atomSubset =3D <br>=C2=A0 =C2=A0 =C2=A0 attrToExpand.join(<b>subset</b>).w=
here(0).equalTo(0).projectFirst(0,1).projectSecond(1);<div><br></div></span=
><div><span style=3D"font-size:12.8000001907349px">Do you think that a Join=
Hint to create a sort-merge join is equivalent to my solution?</span><div><=
div class=3D"h5"><br><div class=3D"gmail_extra"><br><div class=3D"gmail_quo=
te">On Tue, Sep 8, 2015 at 10:45 AM, Stephan Ewen <span dir=3D"ltr">&lt;<a =
href=3D"mailto:sewen@apache.org" target=3D"_blank">sewen@apache.org</a>&gt;=
</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-=
left-style:solid;padding-left:1ex"><div dir=3D"ltr">Hi Flavio!<br><div><br>=
</div><div>No, Flink does not join keys before full values. That is very of=
ten very inefficient, as it results effectively in two joins where one is t=
ypically about as expensive as the original join.</div><div><br></div><div>=
One can do &quot;semi-join-reduction&quot;, in case the join filters out ma=
ny values (many elements from one side do not find a match in the other sid=
e). If the join does not filter, this does not help either.</div><div><br><=
/div><div>Your code is a bit of a surprise. Especially, because in you solu=
tion that worked, the first statement does nothing:</div><span><div><br></d=
iv><div><div style=3D"font-size:12.8px">DataSet&lt;Tuple2&lt;String, List&l=
t;ThriftObj&gt;&gt;&gt; subset =3D</div><div style=3D"font-size:12.8px">=C2=
=A0 =C2=A0 =C2=A0 attrToExpand.project(0).joinWithHuge(bigDataset).where(0)=
.equalTo(0).getInput2();</div></div><div style=3D"font-size:12.8px"><br></d=
iv><div style=3D"font-size:12.8px"><br></div></span><div style=3D"font-size=
:12.8px">This builds a join, but then takes the second input of the join (t=
he=C2=A0<span style=3D"font-size:12.8px">bigDataset=C2=A0</span>data set). =
Because the result of the join is never actually=C2=A0used, it is never exe=
cuted. The second statement hence is effectively</div><div style=3D"font-si=
ze:12.8px"><br></div><div style=3D"font-size:12.8px"><span><div style=3D"fo=
nt-size:12.8px">DataSet&lt;Tuple3&lt;String, List&lt;MyObject&gt;, List&lt;=
ThriftObj&gt;&gt;&gt; atomSubset =3D=C2=A0</div></span><div style=3D"font-s=
ize:12.8px">=C2=A0 =C2=A0 =C2=A0 attrToExpand.join(bigDataset).where(0).equ=
alTo(0).projectFirst(0,1).projectSecond(1);</div><div style=3D"font-size:12=
.8px"><br></div><div style=3D"font-size:12.8px"><br></div><div style=3D"fon=
t-size:12.8px">Curious why this executed when the original did not.</div><d=
iv style=3D"font-size:12.8px"><br></div><div style=3D"font-size:12.8px">BTW=
: If the Lists are very long so they do not fit into a hashtable memory par=
tition, you can try to use a JoinHint to create a sort-merge join. It may b=
ecome slower, but typically works with even less memory.</div><div style=3D=
"font-size:12.8px"><br></div><div style=3D"font-size:12.8px"><br></div><div=
 style=3D"font-size:12.8px">Greetings,<br>Stephan</div><div style=3D"font-s=
ize:12.8px"><br></div></div></div><div><div><div class=3D"gmail_extra"><br>=
<div class=3D"gmail_quote">On Tue, Sep 8, 2015 at 9:59 AM, Flavio Pompermai=
er <span dir=3D"ltr">&lt;<a href=3D"mailto:pompermaier@okkam.it" target=3D"=
_blank">pompermaier@okkam.it</a>&gt;</span> wrote:<br><blockquote class=3D"=
gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border=
-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div=
 dir=3D"ltr">Hi to all,<div><div dir=3D"ltr"><p></p><p></p><p></p><p></p></=
div></div>
<div>I have a case where I don&#39;t understand why flink is not able to op=
timize the join between 2 datasets.</div><div><br></div><div>My initial cod=
e was basically this:</div><div><br></div><div>DataSet&lt;Tuple2&lt;String,=
 List&lt;ThriftObj&gt;&gt;&gt; bigDataset =3D ...;//5.257.207 elements<br><=
/div><div>DataSet&lt;Tuple2&lt;String,List&lt;MyObject&gt;&gt;&gt; attrToEx=
pand =3D ...;//65.000=C2=A0elements<br></div><div><br></div><div>DataSet&lt=
;Tuple2&lt;String, IndexAttributeToExpand&gt;&gt; tmp =3D=C2=A0<br></div><d=
iv>attrToExpand.joinWithHuge(subset).where(0).equalTo(0).projectFirst(0,1).=
projectSecond(1);<br></div><div><br></div><div>This job wasn&#39;t able to =
complete on my local machine (from Eclipse) because Flink was giving me the=
 following error:</div><div><br></div><div><div>Hash join exceeded maximum =
number of recursions, without reducing partitions enough to be memory resid=
ent. Probably cause: Too many duplicate keys.</div></div><div><br></div><di=
v>This was because in attrToExpand the List&lt;MyObject&gt; could be quite =
big. Indeed, changing that code to the following make everything work like =
a charm:</div><div><br></div><div><div>DataSet&lt;Tuple2&lt;String, List&lt=
;ThriftObj&gt;&gt;&gt; subset =3D</div><div>=C2=A0 =C2=A0 =C2=A0 attrToExpa=
nd.project(0).joinWithHuge(bigDataset).where(0).equalTo(0).getInput2();</di=
v><div><br></div><div>DataSet&lt;Tuple3&lt;String, List&lt;MyObject&gt;, Li=
st&lt;ThriftObj&gt;&gt;&gt; atomSubset =3D=C2=A0</div><div>=C2=A0 =C2=A0 =
=C2=A0 attrToExpand.join(subset).where(0).equalTo(0).projectFirst(0,1).proj=
ectSecond(1);</div></div><div><br></div><div><br></div><div>Isn&#39;t somet=
hing impossible for Flink to optimize my initial code into the second? I wa=
s convinced that Flink was performing a join only on the keys before grabbi=
ng also the other elements of the Tuples into memory..am I wrong?</div><div=
><br></div><div>Best,</div><div>Flavio</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br><div><div dir=3D"ltr"><br><p></p><p></p>=
<p></p><p></p></div></div>
</div></div></div></div></div>
</blockquote></div><br></div>

--001a11407dba87b6b3051f38e651--