flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hanna Prinz <hanna_pr...@yahoo.de>
Subject Speedup of Flink Applications
Date Tue, 03 Jan 2017 11:25:27 GMT
Happy new year everyone :)

I’m currently working on a paper about Flink. I already got some recommendations on general
papers with details about Flink, which helped me a lot already. But now that I read them,
I’m further interested is the speedup capabilities, provided by the Flink Framework: How
„far“ can it scale efficiently?

Amdahls law states that a parallelization is only efficient as long as the non-parallelizable
part of the processing (time for the communication between the nodes etc.) doesn’t „eat
up“ the speed gains of parallelization (= parallel slowdown). 
Of course, the communication overhead is mostly caused by the implementation, but the frameworks
specific solution for the communication between the nodes has a reasonable effect as well.

After studying these papers, it looks like, although Flinks performance is better in many
cases, the possible speedup is equal to the possible speedup of Spark.
1. Spark versus Flink - Understanding Performance in Big Data Analytics Frameworks | https://hal.inria.fr/hal-01347638/document
 <https://hal.inria.fr/hal-01347638/document>2. Big Data Analytics on Cray XC Series
DataWarp using Hadoop, Spark and Flink | https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141.pdf
3. Thrill - High-Performance Algorithmic Distributed Batch Data Processing with C++ | https://panthema.net/2016/0816-Thrill-High-Performance-Algorithmic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf

Does someone have …
… more information (or data) on speedup of Flink applications? 
… experience (or data) with Flink in an extremely paralellized environment?
… detailed information on how the nodes communicate, especially when they are waiting for
task results of one another?

Thank you very much for your time & answers!
View raw message