Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AF110200BF1 for ; Tue, 3 Jan 2017 12:25:50 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AD9F9160B46; Tue, 3 Jan 2017 11:25:50 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id D3D1A160B33 for ; Tue, 3 Jan 2017 12:25:49 +0100 (CET) Received: (qmail 73872 invoked by uid 500); 3 Jan 2017 11:25:48 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 73862 invoked by uid 99); 3 Jan 2017 11:25:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Jan 2017 11:25:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6FAF51A01F9 for ; Tue, 3 Jan 2017 11:25:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.199 X-Spam-Level: * X-Spam-Status: No, score=1.199 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=yahoo.de Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id WGGKB9TozeTY for ; Tue, 3 Jan 2017 11:25:44 +0000 (UTC) Received: from nm13-vm3.bullet.mail.ir2.yahoo.com (nm13-vm3.bullet.mail.ir2.yahoo.com [212.82.96.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 244FC5F5C6 for ; Tue, 3 Jan 2017 11:25:44 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.de; s=s2048; t=1483442737; bh=JfVA5M43a/CH7CeiY6xCeIswVIcAJ6QwRlJyvSN0EM8=; h=From:Subject:Date:To:From:Subject; b=kqhWlOq4D9wwTmtoQYJ73rT7YV5wLDWZxrbkqFmvlJ3lcMmrmAvBvEc1URM/4hG9ufNK2Y+QwUIrbyCKZYrjKlEbLIcXRGSfNa010DeiSLfMHI4bRt8GDWapkOspSyzycssQfD4YlHpZKYIMT78TD9GNyF/dP3AonvZBDTyHiUy/goPNyTjWdkkjB2y6Q6gJF/kkfLK890ZXWLQDE8IGrqzCpdXN+dEXa7XWz4D2XQBviswyaCBJD2k7XkhaojIoQFQp7FYKB29n9lAfV9rRGz9tCeDnMwpmAB/225V362ZY9ljbd7qoPQzwHjrMVwNCwuCTryA+h1ECYBMLFADbbw== Received: from [212.82.98.50] by nm13.bullet.mail.ir2.yahoo.com with NNFMP; 03 Jan 2017 11:25:37 -0000 Received: from [46.228.39.69] by tm3.bullet.mail.ir2.yahoo.com with NNFMP; 03 Jan 2017 11:25:35 -0000 Received: from [127.0.0.1] by smtp106.mail.ir2.yahoo.com with NNFMP; 03 Jan 2017 11:25:28 -0000 X-Yahoo-Newman-Id: 384947.33651.bm@smtp106.mail.ir2.yahoo.com X-Yahoo-Newman-Property: ymail-3 X-YMail-OSG: 3QCulNYVM1l02GMB6pd8yuaUKWDlxjot297uvvqVT3hCxS4 r.IYz4WWNLWjJFhuPCWFvQQYHFEXYvgw_P0TXW0ojS3IFMARJOAiuDNzCFVR IZO6f6C2CAJGSwnqG6gbDoCE6ELBuqOaFVbggT026TphROFG96lwOJnSp9DZ T.4RWNWalzGz0h_MBMjvOBbmpgJN_IHh4KOkHz9l3djz3mW97EyuUl57Zuj5 EO1IULRRSgLonp18vmvEZ.4_3hbUHC_c7h9rHMaLIszfFzPbeU_iEpdVeGBS 8r5lwWelp9TQ2lP8Yy_PyOrxrh3EUljxkQAPazt.Div9HpYTAY2Lm_eRUVZm _gxD_3qUuC5ghBEWT7N2ikPRe1H1TuKYBuZJUuQEzo6G2X799GwI2LeNrNaY UtY8D5oFp64WB5coPe7ke.zlTMTi4B4mFZw_LjuMpq8oeJUnn2LKwc_Qb2Xg _rftq2LH4jejbC0G0LqwWyfcTxfJjl4wcJG439k6OfC5Z2BfKeq.M7UWN8Qo zVp6XyFVcWYA89yAqG5NB6cfGR.wiDiNlzFbvu9vVn_s4jPvTHnT5W4yY7ql GeBKn6.7xa4IWvYAElae2p2w9jCb1LcJG2M5sTN.ug3hbLEM3K34udIh.c9Z kEFMPgdBPTWunKGrXN02ntA-- X-Yahoo-SMTP: 6MCzPFWswBCoqm1CSxxHCc6bBhjq.yaT From: Hanna Prinz Content-Type: multipart/alternative; boundary="Apple-Mail=_14B0214F-D26D-43DA-B4D7-2FAB1D5BD97C" Mime-Version: 1.0 (Mac OS X Mail 10.2 \(3259\)) Subject: Speedup of Flink Applications Message-Id: <7A05C435-792E-4965-9B97-D65BCBF501CC@yahoo.de> Date: Tue, 3 Jan 2017 12:25:27 +0100 To: user@flink.apache.org X-Mailer: Apple Mail (2.3259) archived-at: Tue, 03 Jan 2017 11:25:50 -0000 --Apple-Mail=_14B0214F-D26D-43DA-B4D7-2FAB1D5BD97C Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Happy new year everyone :) I=E2=80=99m currently working on a paper about Flink. I already got some = recommendations on general papers with details about Flink, which helped = me a lot already. But now that I read them, I=E2=80=99m further = interested is the speedup capabilities, provided by the Flink Framework: = How =E2=80=9Efar=E2=80=9C can it scale efficiently? Amdahls law states that a parallelization is only efficient as long as = the non-parallelizable part of the processing (time for the = communication between the nodes etc.) doesn=E2=80=99t =E2=80=9Eeat up=E2=80= =9C the speed gains of parallelization (=3D parallel slowdown).=20 Of course, the communication overhead is mostly caused by the = implementation, but the frameworks specific solution for the = communication between the nodes has a reasonable effect as well. After studying these papers, it looks like, although Flinks performance = is better in many cases, the possible speedup is equal to the possible = speedup of Spark. 1. Spark versus Flink - Understanding Performance in Big Data Analytics = Frameworks | https://hal.inria.fr/hal-01347638/document 2. Big Data Analytics on = Cray XC Series DataWarp using Hadoop, Spark and Flink | = https://cug.org/proceedings/cug2016_proceedings/includes/files/pap141.pdf = 3. Thrill - High-Performance Algorithmic Distributed Batch Data = Processing with C++ | = https://panthema.net/2016/0816-Thrill-High-Performance-Algorithmic-Distrib= uted-Batch-Data-Processing-with-CPP/1608.05634v1.pdf = Does someone have =E2=80=A6 =E2=80=A6 more information (or data) on speedup of Flink applications?=20= =E2=80=A6 experience (or data) with Flink in an extremely paralellized = environment? =E2=80=A6 detailed information on how the nodes communicate, especially = when they are waiting for task results of one another? Thank you very much for your time & answers! Hanna= --Apple-Mail=_14B0214F-D26D-43DA-B4D7-2FAB1D5BD97C Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 Happy new year everyone :)

I=E2=80=99m currently working on a = paper about Flink. I already got some recommendations on general papers = with details about Flink, which helped me a lot already. But now that I = read them, I=E2=80=99m further interested is the speedup = capabilities, provided by the Flink Framework: How =E2=80=9Efar=E2=80= =9C can it scale efficiently?

Amdahls law states that = a parallelization is only efficient as long as the = non-parallelizable part of the processing (time for the communication = between the nodes etc.) doesn=E2=80=99t =E2=80=9Eeat up=E2=80=9C the = speed gains of parallelization (=3D parallel = slowdown). 
Of course, the communication = overhead is mostly caused by the implementation, but the frameworks = specific solution for the communication between the nodes has a = reasonable effect as well.

After studying these papers, it looks like, although Flinks = performance is better in many cases, the possible speedup is equal to = the possible speedup of Spark.
1. Spark versus Flink - = Understanding Performance in Big Data Analytics Frameworks | https://hal.inria.fr/hal-01347638/document
2. Big Data Analytics on Cray XC Series DataWarp = using Hadoop, Spark and Flink | https://cug.org/proceedings/cug2016_proceedings/includes/files/= pap141.pdf
3. Thrill - High-Performance Algorithmic = Distributed Batch Data Processing with C++ | https://panthema.net/2016/0816-Thrill-High-Performance-Algorith= mic-Distributed-Batch-Data-Processing-with-CPP/1608.05634v1.pdf
<= /blockquote>

Does = someone have =E2=80=A6
=E2=80=A6 more information = (or data) on speedup of Flink applications? 
=E2= =80=A6 experience (or data) with Flink in an extremely paralellized = environment?
=E2=80=A6 detailed information on how = the nodes communicate, especially when they are waiting for task results = of one another?

Thank you very much for your time & answers!
Hanna
= --Apple-Mail=_14B0214F-D26D-43DA-B4D7-2FAB1D5BD97C--