flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vinaya M S <vinay...@gmail.com>
Subject Re: Comparison of storm and flink
Date Sat, 23 Jan 2016 22:33:17 GMT
Hi Slim Baltagi,

   Thank you for the list you mentioned. It will be really helpful. I have
gone through few of the materials you have mentioned, like:

1. Benchmarking Streaming Computation Engines at Yahoo!
2. CapitalOne slides in slideshare.
3. Data-artisan article.

Based on these I have identified few of the metrics.

1. Number of tuples processed for every second.
2. Measuring throughput by keeping number of tuples/second constant.

I'm thinking of comparing:
Read/write throughput: I have to figure out a way to compare storm::spout
~flink::env.getstream and storm::ReportBolt ~ flink::sink

I'm not sure of it yet.

During the seven-week Insight Data Engineering Fellows program we aim to
build a data platform to handle large, real-time datasets. Considering the
short period we spend at Insight working on a project, I don't consider it
to be full blown benchmark study. But I wanted to be careful and would be
willing to work further on those lines.

I have enrolled for the meet up happening at NYC as I consider it to be
great place to gain knowledge on flink. Looking forward for your talk as
well as to meet you and discuss the questions I have.


Thank you,
Vinaya M S




On Sat, Jan 23, 2016 at 3:14 PM, Slim Baltagi <sbaltagi@gmail.com> wrote:

> Hi Vinaya
>
> 1. Comparing streaming tools ( in this case Storm and Flink) should not be
> based on performance benchmarks only! For example, slides 16-36 list over
> 96
> criteria, that we identified at Capital One, to compare two streaming tools
> http://www.slideshare.net/sbaltagi/flink-vs-spark/17
>
> 2. Now, if you are focusing on performance only, I'll suggest a few related
> resources:
>
> - Benchmarking Streaming Computation Engines at Yahoo!
>
> http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at
> December 16, 2015 Code at github:
> https://github.com/yahoo/streaming-benchmarks
>
> -  There is some work started by some Flink contributors to create some
> performance scripts for Flink, Spark, and MapReduce here: There is Apache
> Flink: Performance and Testing
> https://github.com/project-flink/flink-perf
>
> - Some first numbers on performance of streaming jobs with Apache Flink are
> here:
>
> http://data-artisans.com/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink/
> under the section: 'Show me the numbers'. Code used is at:
> https://github.com/dataArtisans/performance
>
> - Yangjun Wang is currently working on his Master thesis at Aalto
> university
> in Helsinki, Finland. The topic of his thesis is about building a standard
> benchmark system for streaming processing systems like Apache Storm, Spark
> and Flink. Code at github
> https://github.com/wangyangjun/StreamBench/tree/master/StreamBench
>
> 3. I am giving a talk in NYC on Tuesday February 2nd, 2016 on Apache Flink
> and I will be touching a bit on benchmarks
>
> http://www.meetup.com/New-York-City-NYC-Apache-Flink-Meetup/events/228113118/
> You are welcome to attend.
>
> Thanks
>
> Slim Baltagi
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Comparison-of-storm-and-flink-tp4468p4469.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Mime
View raw message