flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gabriele Di Bernardo <gabriele.diberna...@me.com>
Subject Latency on Flink
Date Mon, 05 Jun 2017 12:42:10 GMT
Hello everyone,

I am a completely newcomer of streaming engines and big data platforms but I am considering
using Flink for my master thesis and before starting using it I am trying to do some kind
of evaluation of the system. In particular I am interested in observing how the system reacts
in terms of latency when it receives a big amount of data to process.

I set up a simple application consisting in:
– a Kafka producer that generates data for a Kafka topic; each data message is distinguished
by a source id.

– a Flink consumer app that reads from Kafka and it should apply some kind of reduction
operator to the received data (e.g. calculate MEAN value of the last 1000 elements received).
The Flink consumer keeps the state of the messages coming from a certain source (not sure
if this is the more efficient approach though). 

I run this application on AWS using EMR with a relatively simple configuration:
– EMR Cluster: Master m3.xlarge (4	CPU + 15GiB Memory), 2 Core (2 x m3.xlarge )
– Kafka + Zookeeper running in a m4.xlarge (4CPU + 16GiB memory).

I run the expirement with 2 task managers and 4 slots; I also tried to play with the number
of partitions of the kafka topic but I experienced really high-latency with the increase of
the number of messages generate per seconds by the Kafka producer. With the simple configuration
described above I experienced really high latency when for example my consumer application
generates 5000 double values per seconds; and more messages are created more the latency increases.

I would like to ask you if, even for this super simple experiment, should I scale-out my Flink
and/or Kafka cluster to observe better performance?

If you have time you can check out my simple code at: https://github.com/gdibernardo/streaming-engines-benchmark
<https://github.com/gdibernardo/streaming-engines-benchmark>. If you have any suggestions
regarding how to improve my experiments I'd love to hear from you.

Thank you so much.


View raw message