flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Timur Shenkao <...@timshenkao.su>
Subject Re: Speedup of Flink Applications
Date Tue, 03 Jan 2017 14:44:47 GMT
Hi,
It seems your questions are too abstract & theoretical. The answer is : it
depends on several factors. Skewness in data, data volume, reliability
requirements, "fatness" of servers, whether one performs look-up in other
data sources, etc.
The papers you mentioned mean the following: under concrete & specific
conditions, researchers achieved their results. If they had changed some
parameters slightly (increase network's throughput, for example, or change
garbage collector's options) , the results would have been completely
different.

On Tuesday, January 3, 2017, Hanna Prinz <hanna_prinz@yahoo.de> wrote:

> Happy new year everyone :)
>
> I’m currently working on a paper about Flink. I already got some
> recommendations on general papers with details about Flink, which helped me
> a lot already. But now that I read them,* I’m further interested is the
> speedup capabilities, provided by the Flink Framework: How „far“ can it
> scale efficiently?*
>
> Amdahls law states that a parallelization is only efficient as long as the
> non-parallelizable part of the processing (time for the communication
> between the nodes etc.) doesn’t „eat up“ the speed gains of parallelization
> (= parallel slowdown).
> Of course, the communication overhead is mostly caused by the
> implementation, but the frameworks specific solution for the communication
> between the nodes has a reasonable effect as well.
>
> After studying these papers, it looks like, although Flinks performance is
> better in many cases, the possible speedup is equal to the possible speedup
> of Spark.
>
> 1. Spark versus Flink - Understanding Performance in Big Data Analytics
> Frameworks | https://hal.inria.fr/hal-01347638/document
> 2. Big Data Analytics on Cray XC Series DataWarp using Hadoop, Spark and
> Flink | https://cug.org/proceedings/cug2016_proceedings/includes/
> files/pap141.pdf
> 3. Thrill - High-Performance Algorithmic Distributed Batch Data Processing
> with C++ | https://panthema.net/2016/0816-Thrill-High-Performance-
> Algorithmic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf
>
>
> Does someone have …
> … more information (or data) on speedup of Flink applications?
> … experience (or data) with Flink in an extremely paralellized environment?
> … detailed information on how the nodes communicate, especially when they
> are waiting for task results of one another?
>
> Thank you very much for your time & answers!
> Hanna
>

Mime
View raw message