flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Metzger <rmetz...@apache.org>
Subject Re: Creating a representative streaming workload
Date Wed, 18 Nov 2015 10:14:16 GMT
Hey Vasia,

I think a very common workload would be an event stream from web servers of
an online shop. Usually, these shops have multiple servers, so events
arrive out of order.
I think there are plenty of different use cases that you can build around
that data:
- Users perform different actions that a streaming system could track
(analysis of click-paths),
- some simple statistics using windows (items sold in the last 10 minutes,
- Maybe fraud detection would be another use case.
- Often, there also needs to be a sink to HDFS or another file system for a
long-term archive.

I would love to see such an event generator in flink's contrib module. I
think that's something the entire streaming space could use.

On Mon, Nov 16, 2015 at 8:22 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> All those should apply for streaming too...
> On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com> wrote:
>> Hi,
>> thanks Nick and Ovidiu for the links!
>> Just to clarify, we're not looking into creating a generic streaming
>> benchmark. We have quite limited time and resources for this project. What
>> we want is to decide on a set of 3-4 _common_ streaming applications. To
>> give you an idea, for the batch workload, we will pick something like a
>> grep, one relational application, a graph algorithm, and an ML algorithm.
>> Cheers,
>> -Vasia.
>> On 16 November 2015 at 19:25, Ovidiu-Cristian MARCU <
>> ovidiu-cristian.marcu@inria.fr> wrote:
>>> Regarding Flink vs Spark / Storm you can check here:
>>> http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark
>>> Best regards,
>>> Ovidiu
>>> On 16 Nov 2015, at 15:21, Vasiliki Kalavri <vasilikikalavri@gmail.com>
>>> wrote:
>>> Hello squirrels,
>>> with some colleagues and students here at KTH, we have started 2
>>> projects to evaluate (1) performance and (2) behavior in the presence of
>>> memory interference in cloud environments, for Flink and other systems. We
>>> want to provide our students with a workload of representative applications
>>> for testing.
>>> While for batch applications, it is quite clear to us what classes of
>>> applications are widely used and how to create a workload of different
>>> types of applications, we are not quite sure about the streaming workload.
>>> That's why, we'd like your opinions! If you're using Flink streaming in
>>> your company or your project, we'd love your input even more :-)
>>> What kind of applications would you consider as "representative" of a
>>> streaming workload? Have you run any experiments to evaluate Flink versus
>>> Spark, Storm etc.? If yes, would you mind sharing your code with us?
>>> We will of course be happy to share our results with everyone after we
>>> have completed our study.
>>> Thanks a lot!
>>> -Vasia.

View raw message