Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 49464200C7C for ; Mon, 5 Jun 2017 14:42:22 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 47F95160BD4; Mon, 5 Jun 2017 12:42:22 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 673DB160BBF for ; Mon, 5 Jun 2017 14:42:21 +0200 (CEST) Received: (qmail 75079 invoked by uid 500); 5 Jun 2017 12:42:20 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 75069 invoked by uid 99); 5 Jun 2017 12:42:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Jun 2017 12:42:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F113DCF4E3 for ; Mon, 5 Jun 2017 12:42:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.598 X-Spam-Level: X-Spam-Status: No, score=-1.598 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-2.796, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=me.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id YEnqNoJcStY1 for ; Mon, 5 Jun 2017 12:42:19 +0000 (UTC) Received: from pv33p03im-asmtp002.me.com (pv33p03im-asmtp002.me.com [17.143.180.11]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 672385F27B for ; Mon, 5 Jun 2017 12:42:18 +0000 (UTC) Received: from process-dkim-sign-daemon.pv33p03im-asmtp002.me.com by pv33p03im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) id <0OR200500R387L00@pv33p03im-asmtp002.me.com> for user@flink.apache.org; Mon, 05 Jun 2017 12:42:07 +0000 (GMT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=me.com; s=4d515a; t=1496666527; bh=t/gAAq1dES4TF9SqhkMcVdw3YC4y/MDSANedC0xeQKk=; h=From:Content-type:MIME-version:Subject:Message-id:Date:To; b=KTvlI1rAYo0ptYNVvABMBKxfZIlKHgCqGB6ZLTaTjW7ir5mHAamdSAHUlBUhiFCXE veDHTsD0u9+Yq2AwzzQa6PsgJ1RQeKjroDVugx6nzptEtTlaPn8s3SwAoq2sbNCyUJ SIhBcwUFt1rHyGYL5xu8rOeCO1xJjB+Y1wFy9+t4I8wWvKrWXfMuMPcP7ucxPwT5JE gDbMhx/a7IfA68keISC6QYl3QXyOVuNUy8Ejcz0s3cYgyvgom3v1qJXIo6uBa9mhjd DlDy9nKP2jnEDWtX5IPJS6eOl9YMvMjUkcfT+bGsONmmzZSAGHvL0wkluQfZ4fLfOz pYAm0gVTFlIpA== Received: from icloud.com ([127.0.0.1]) by pv33p03im-asmtp002.me.com (Oracle Communications Messaging Server 7.0.5.38.0 64bit (built Feb 26 2016)) with ESMTPSA id <0OR200LSAR5O5I20@pv33p03im-asmtp002.me.com> for user@flink.apache.org; Mon, 05 Jun 2017 12:42:05 +0000 (GMT) X-Proofpoint-Virus-Version: vendor=fsecure engine=2.50.10432:,, definitions=2017-06-05_06:,, signatures=0 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 spamscore=0 clxscore=1034 suspectscore=1 malwarescore=0 phishscore=0 adultscore=0 bulkscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1701120000 definitions=main-1706050241 From: Gabriele Di Bernardo Content-type: multipart/alternative; boundary="Apple-Mail=_27E518D3-CE13-4E59-A539-A7A76AF0B506" MIME-version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Latency on Flink Message-id: Date: Mon, 05 Jun 2017 14:42:10 +0200 To: user@flink.apache.org X-Mailer: Apple Mail (2.3273) archived-at: Mon, 05 Jun 2017 12:42:22 -0000 --Apple-Mail=_27E518D3-CE13-4E59-A539-A7A76AF0B506 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hello everyone, I am a completely newcomer of streaming engines and big data platforms = but I am considering using Flink for my master thesis and before = starting using it I am trying to do some kind of evaluation of the = system. In particular I am interested in observing how the system reacts = in terms of latency when it receives a big amount of data to process. I set up a simple application consisting in: =E2=80=93 a Kafka producer that generates data for a Kafka topic; each = data message is distinguished by a source id. =E2=80=93 a Flink consumer app that reads from Kafka and it should apply = some kind of reduction operator to the received data (e.g. calculate = MEAN value of the last 1000 elements received). The Flink consumer keeps = the state of the messages coming from a certain source (not sure if this = is the more efficient approach though).=20 I run this application on AWS using EMR with a relatively simple = configuration: =E2=80=93 EMR Cluster: Master m3.xlarge (4 CPU + 15GiB Memory), 2 = Core (2 x m3.xlarge ) =E2=80=93 Kafka + Zookeeper running in a m4.xlarge (4CPU + 16GiB = memory). I run the expirement with 2 task managers and 4 slots; I also tried to = play with the number of partitions of the kafka topic but I experienced = really high-latency with the increase of the number of messages generate = per seconds by the Kafka producer. With the simple configuration = described above I experienced really high latency when for example my = consumer application generates 5000 double values per seconds; and more = messages are created more the latency increases.=20 I would like to ask you if, even for this super simple experiment, = should I scale-out my Flink and/or Kafka cluster to observe better = performance? If you have time you can check out my simple code at: = https://github.com/gdibernardo/streaming-engines-benchmark = . If you = have any suggestions regarding how to improve my experiments I'd love to = hear from you. Thank you so much. Best, Gabriele= --Apple-Mail=_27E518D3-CE13-4E59-A539-A7A76AF0B506 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

Hello everyone,

I am a = completely newcomer of streaming engines and big data platforms but I am = considering using Flink for my master thesis and before starting using = it I am trying to do some kind of evaluation of the system. In = particular I am interested in observing how the system reacts in terms = of latency when it receives a big amount of data to process.

I set up a simple application consisting in:
=E2=80= =93 a Kafka producer that generates data for a Kafka topic; each data = message is distinguished by a source id.

=E2=80=93 a = Flink consumer app that reads from Kafka and it should apply some kind = of reduction operator to the received data (e.g. calculate MEAN value of = the last 1000 elements received). The Flink consumer keeps the state of = the messages coming from a certain source (not sure if this is the more = efficient approach though). 

I run this = application on AWS using EMR with a relatively simple configuration:
=E2=80=93 EMR Cluster: Master m3.xlarge (4 CPU + 15GiB = Memory), 2 Core (2 x m3.xlarge )
=E2=80=93 Kafka + = Zookeeper running in a m4.xlarge (4CPU + 16GiB memory).

I= run the expirement with 2 task managers and 4 slots; I also tried to = play with the number of partitions of the kafka topic but I experienced = really high-latency with the increase of the number of messages generate = per seconds by the Kafka producer. With the simple configuration = described above I experienced really high latency when for example my = consumer application generates 5000 double values per seconds; and more = messages are created more the latency increases. 

I = would like to ask you if, even for this super simple experiment, should = I scale-out my Flink and/or Kafka cluster to observe better = performance?

If you have time you can check out my = simple code at: https://github.com/gdibernardo/streaming-engines-benchmark. = If you have any suggestions regarding how to improve my experiments I'd = love to hear from you.

Thank you so much.

Best,


Gabriele

= --Apple-Mail=_27E518D3-CE13-4E59-A539-A7A76AF0B506--