flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bill Sparks <jspa...@cray.com>
Subject scaling flink
Date Fri, 05 Jun 2015 15:16:24 GMT
Hi.

I'm running some comparisons between flink, MRv2, and spark(1.3), using the new Intel HiBench
suite. I've started with the stock workcount example and I'm seeing some numbers which are
not where I thought I'd be.

So the question I have is what the the configuration parameters which can affect the performance?
Is there a performance/tuning guide.

What we have – hardware wise are 48 Haswell/32 physical/64 HT cores with 128 GB, FDR connect
nodes. I'm parsing 2TB of text, using the following parameters.

./bin/flink run -m yarn-cluster \
-yD fs.overwrite-files=true \
-yD fs.output.always-create-directory=true \
-yq \
-yn $((666)) \
-yD taskmanager.numberOfTaskSlots=$((1)) \
-yD parallelization.degree.default=$((666)) \
-ytm $((4*1024)) \
-yjm $((4*1024)) \
./examples/flink-java-examples-0.9-SNAPSHOT-WordCount.jar \
hdfs:///user/jsparks/HiBench/Wordcount/Input \
hdfs:///user/jsparks/HiBench/Wordcount/Output

Any pointers would be greatly appreciated.


Type                Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)
 Throughput/node
HadoopWordcount     2015-06-03 10:45:11 2052360935068        763.106              2689483420
          2689483420
JavaSparkWordcount  2015-06-03 10:55:24 2052360935068        411.246              4990591847
          4990591847
ScalaSparkWordcount 2015-06-03 11:06:24 2052360935068        342.777              5987452294
          5987452294

Type                Date       Time     Input_data_size      Duration(s)          Throughput(bytes/s)
 Throughput/node
flinkWordCount      2015-06-04 16:27:27 2052360935068        647.383              3170242244
          66046713


--
Jonathan (Bill) Sparks
Software Architecture
Cray Inc.

Mime
View raw message