Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
Sender: ewenstephan@gmail.com
In-Reply-To: 
 <CANC1h_sdbsc5syYOHLkOWpxsGOkoeOrgz1Ub8o6qCjpppg8igg@mail.gmail.com>
References: <D1972778.64426%jsparks@cray.com>
	<CANC1h_sdbsc5syYOHLkOWpxsGOkoeOrgz1Ub8o6qCjpppg8igg@mail.gmail.com>
Date: Fri, 5 Jun 2015 19:48:48 +0200
Message-ID: 
 <CANC1h_tcQXc5XKrVQpK_jheRp-efpRZ9cg3uh6C9Bjwon8s57g@mail.gmail.com>
Subject: Re: scaling flink
From: Stephan Ewen <sewen@apache.org>
To: user@flink.apache.org
Content-Type: multipart/alternative; boundary=001a11c2c5888592390517c8e740

--001a11c2c5888592390517c8e740
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

It was supposed to mean "please PING us" ;-)

On Fri, Jun 5, 2015 at 7:21 PM, Stephan Ewen <sewen@apache.org> wrote:

> Hi Bill!
>
> For the WordCount case, these numbers are not unexpected. Flink does not
> yet use a hash aggregator for the "reduce(v1, v2)" call, but uses a
> sort-based aggregation for that. Flink's sort aggregations are very
> reliable and very scalable compared to many hash aggregations, but often
> more expensive. Especially on low-key-cardinality data sets, hash
> aggregations outperform sort aggregations.
>
> It is on the roadmap to add a managed-memory hash aggregator that is
> reliable. For now, Flink's runtime has managed memory sorts and hash-join=
s,
> so we stuck with the reliability over the performance.
>
> It is cool to see that you are doing an evaluation and we are very curiou=
s
> about your outcomes. Let us now please how it looks for other operations
> and patterns, like joins, iterations, ...
>
>
>
> Concerning performance tuning, here are a few pointers that may be
> interesting:
>
>   - You are using a lot of very small TaskManagers, each with one slot. I=
t
> will most likely be faster if you use fewer TaskManagers with more slots,
> because then the network stack is shared between more tasks. This results
> in fewer TCP connections, which each carry more data. You could try "-yn
> $((111)) -ytm $((24*1024)) -yD taskmanager.numberOfTaskSlots=3D$((6))" fo=
r
> example.
>
>   - The example word-count implementation is not particularly tuned, I
> think one can do better there.
>
>   - Flink has a mode to reuse objects, which takes a bit of pressure from
> the garbage collector. Where objects are not cached by the user code, thi=
s
> may help reduce pressure that user code imposes on the GarbageCollector.
>
>
> BTW: Are you including the YARN startup time, or are you measuring from
> when the program execution starts?
>
>
> Please pig us if you have more questions!
>
>
> Greetings,
> Stephan
>
>
> On Fri, Jun 5, 2015 at 5:16 PM, Bill Sparks <jsparks@cray.com> wrote:
>
>>  Hi.
>>
>>  I'm running some comparisons between flink, MRv2, and spark(1.3), using
>> the new Intel HiBench suite. I've started with the stock workcount examp=
le
>> and I'm seeing some numbers which are not where I thought I'd be.
>>
>>  So the question I have is what the the configuration parameters which
>> can affect the performance? Is there a performance/tuning guide.
>>
>>  What we have =E2=80=93 hardware wise are 48 Haswell/32 physical/64 HT c=
ores
>> with 128 GB, FDR connect nodes. I'm parsing 2TB of text, using the
>> following parameters.
>>
>>  ./bin/flink run -m yarn-cluster \
>> -yD fs.overwrite-files=3Dtrue \
>> -yD fs.output.always-create-directory=3Dtrue \
>> -yq \
>> -yn $((666)) \
>> -yD taskmanager.numberOfTaskSlots=3D$((1)) \
>> -yD parallelization.degree.default=3D$((666)) \
>> -ytm $((4*1024)) \
>> -yjm $((4*1024)) \
>> ./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar \
>> hdfs:///user/jsparks/HiBench/Wordcount/Input \
>> hdfs:///user/jsparks/HiBench/Wordcount/Output
>>
>>  Any pointers would be greatly appreciated.
>>
>>  Type                Date       Time     Input_data_size      Duration(s=
)          Throughput(bytes/s)  Throughput/node
>> HadoopWordcount     2015-06-03 10:45:11 2052360935068        763.106    =
          2689483420           2689483420
>> JavaSparkWordcount  2015-06-03 10:55:24 2052360935068        411.246    =
          4990591847           4990591847
>> ScalaSparkWordcount 2015-06-03 11:06:24 2052360935068        342.777    =
          5987452294           5987452294
>>
>> Type                Date       Time     Input_data_size      Duration(s)=
          Throughput(bytes/s)  Throughput/node
>> flinkWordCount      2015-06-04 16:27:27 2052360935068        647.383    =
          3170242244           66046713
>>
>>
>>
>>  --
>>  Jonathan (Bill) Sparks
>> Software Architecture
>> Cray Inc.
>>
>
>

--001a11c2c5888592390517c8e740
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">It was supposed to mean &quot;please PING us&quot; ;-)</di=
v><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Jun 5, =
2015 at 7:21 PM, Stephan Ewen <span dir=3D"ltr">&lt;<a href=3D"mailto:sewen=
@apache.org" target=3D"_blank">sewen@apache.org</a>&gt;</span> wrote:<br><b=
lockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px =
#ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi Bill!<br><div><br></div><d=
iv>For the WordCount case, these numbers are not unexpected. Flink does not=
 yet use a hash aggregator for the &quot;reduce(v1, v2)&quot; call, but use=
s a sort-based aggregation for that. Flink&#39;s sort aggregations are very=
 reliable and very scalable compared to many hash aggregations, but often m=
ore expensive. Especially on low-key-cardinality data sets, hash aggregatio=
ns outperform sort aggregations.</div><div><br></div><div>It is on the road=
map to add a managed-memory hash aggregator that is reliable. For now, Flin=
k&#39;s runtime has managed memory sorts and hash-joins, so we stuck with t=
he reliability over the performance.</div><div><br></div><div>It is cool to=
 see that you are doing an evaluation and we are very curious about your ou=
tcomes. Let us now please how it looks for other operations and patterns, l=
ike joins, iterations, ...</div><div><br></div><div><br></div><div><br></di=
v><div>Concerning performance tuning, here are a few pointers that may be i=
nteresting:</div><div><br></div><div>=C2=A0 - You are using a lot of very s=
mall TaskManagers, each with one slot. It will most likely be faster if you=
 use fewer TaskManagers with more slots, because then the network stack is =
shared between more tasks. This results in fewer TCP connections, which eac=
h carry more data. You could try=C2=A0&quot;-yn $((111)) -ytm $((24*1024)) =
-yD taskmanager.numberOfTaskSlots=3D$((6))&quot; for example.</div><div><br=
></div><div>=C2=A0 - The example word-count implementation is not particula=
rly tuned, I think one can do better there.<br></div><div><br></div><div>=
=C2=A0 - Flink has a mode to reuse objects, which takes a bit of pressure f=
rom the garbage collector. Where objects are not cached by the user code, t=
his may help reduce pressure that user code imposes on the GarbageCollector=
.<br></div><div><br></div><div><br></div><div>BTW: Are you including the YA=
RN startup time, or are you measuring from when the program execution start=
s?</div><div><br></div><div><br></div><div>Please pig us if you have more q=
uestions!</div><div><br></div><div><br></div><div>Greetings,</div><div>Step=
han</div><div><br></div></div><div class=3D"HOEnZb"><div class=3D"h5"><div =
class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Fri, Jun 5, 2015 at=
 5:16 PM, Bill Sparks <span dir=3D"ltr">&lt;<a href=3D"mailto:jsparks@cray.=
com" target=3D"_blank">jsparks@cray.com</a>&gt;</span> wrote:<br><blockquot=
e class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc sol=
id;padding-left:1ex">


<div style=3D"word-wrap:break-word;color:rgb(0,0,0);font-size:18px;font-fam=
ily:Calibri,sans-serif">
<div>Hi.</div>
<div><br>
</div>
<div>I&#39;m running some comparisons between flink, MRv2, and spark(1.3), =
using the new Intel HiBench suite. I&#39;ve started with the stock workcoun=
t example and I&#39;m seeing some numbers which are not where I thought I&#=
39;d be.</div>
<div><br>
</div>
<div>So the question I have is what the the configuration parameters which =
can affect the performance? Is there a performance/tuning guide.</div>
<div><br>
</div>
<div>What we have =E2=80=93 hardware wise are 48 Haswell/32 physical/64 HT =
cores with 128 GB, FDR connect nodes. I&#39;m parsing 2TB of text, using th=
e following parameters.</div>
<div><br>
</div>
<div>
<div>./bin/flink run -m yarn-cluster \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yD fs.overwrite-files=3Dt=
rue \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yD fs.output.always-creat=
e-directory=3Dtrue \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yq \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yn $((666)) \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yD taskmanager.numberOfTa=
skSlots=3D$((1)) \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yD parallelization.degree=
.default=3D$((666)) \</div>
<div><span style=3D"white-space:pre-wrap"></span>-ytm $((4*1024)) \</div>
<div><span style=3D"white-space:pre-wrap"></span>-yjm $((4*1024)) \</div>
<div><span style=3D"white-space:pre-wrap"></span>./examples/flink-java-exam=
ples-0.9-SNAPSHOT-WordCount.jar \</div>
<div><span style=3D"white-space:pre-wrap"></span>hdfs:///user/jsparks/HiBen=
ch/Wordcount/Input \</div>
<div><span style=3D"white-space:pre-wrap"></span>hdfs:///user/jsparks/HiBen=
ch/Wordcount/Output</div>
</div>
<div><br>
</div>
<div>Any pointers would be greatly appreciated.</div>
<div><br>
</div>
<pre style=3D"margin-top:10px;margin-bottom:0px;padding:0px;color:rgb(51,51=
,51);font-size:14px;line-height:20px">Type =C2=A0 =C2=A0 =C2=A0 =C2=A0     =
   Date =C2=A0 =C2=A0 =C2=A0 Time =C2=A0 =C2=A0 Input_data_size =C2=A0 =C2=
=A0 =C2=A0Duration(s) =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Throughput(bytes/s)=
 =C2=A0Throughput/node =C2=A0 =C2=A0=C2=A0<br>HadoopWordcount     2015-06-0=
3 10:45:11 2052360935068 =C2=A0 =C2=A0 =C2=A0 =C2=A0763.106 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A02689483420 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 2689483420=C2=A0<br>JavaSparkWordcount  2015-06-03 10:55:24 20523609=
35068 =C2=A0 =C2=A0 =C2=A0 =C2=A0411.246 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=
 =C2=A0 =C2=A04990591847 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 4990591847=C2=
=A0<br>ScalaSparkWordcount 2015-06-03 11:06:24 2052360935068 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0342.777 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0598745=
2294 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 5987452294 =C2=A0=C2=A0=C2=A0</pre>
<pre style=3D"margin-top:10px;margin-bottom:0px;padding:0px;color:rgb(51,51=
,51);font-size:14px;line-height:20px">Type                Date       Time  =
   Input_data_size      Duration(s)          Throughput(bytes/s)  Throughpu=
t/node <br>flinkWordCount      2015-06-04 16:27:27 2052360935068        647=
.383              3170242244           66046713</pre><span><font color=3D"#=
888888">
<div><br>
</div>
<div>=C2=A0</div>
<div>
<div>--=C2=A0</div>
<div>
<div>Jonathan (Bill) Sparks</div>
<div>Software Architecture</div>
<div>Cray Inc.</div>
</div>
</div>
</font></span></div>

</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a11c2c5888592390517c8e740--