Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CANZa=GuHbtMfPXwNbFTJShbHvM8uQSJ+i0BFrY89q2SY2BixxQ@mail.gmail.com>
References: 
 <CAJZ2dcVRUhDWgm0DQk7e7TEEg5chdWhSUtXptpBULP6pND+JBw@mail.gmail.com>
 <0DB9D25D-33AF-4F03-A2F4-FBF3F6B2013E@inria.fr>
 <CAJZ2dcVE5V4Jt44COR3PTYcYNLmdmTYSfRaOyE2JJYuVZQx+Hw@mail.gmail.com>
 <CANZa=GuHbtMfPXwNbFTJShbHvM8uQSJ+i0BFrY89q2SY2BixxQ@mail.gmail.com>
From: Robert Metzger <rmetzger@apache.org>
Date: Wed, 18 Nov 2015 11:14:16 +0100
Message-ID: 
 <CAGr9p8DiBuKMQZ8wfcqnrX=RtQRevVFPAcOGH_qh1_MHxcDd0A@mail.gmail.com>
Subject: Re: Creating a representative streaming workload
To: "user@flink.apache.org" <user@flink.apache.org>
Content-Type: multipart/alternative; boundary=001a113eb83ecbe4380524cde87f

--001a113eb83ecbe4380524cde87f
Content-Type: text/plain; charset=UTF-8

Hey Vasia,

I think a very common workload would be an event stream from web servers of
an online shop. Usually, these shops have multiple servers, so events
arrive out of order.
I think there are plenty of different use cases that you can build around
that data:
- Users perform different actions that a streaming system could track
(analysis of click-paths),
- some simple statistics using windows (items sold in the last 10 minutes,
..).
- Maybe fraud detection would be another use case.
- Often, there also needs to be a sink to HDFS or another file system for a
long-term archive.

I would love to see such an event generator in flink's contrib module. I
think that's something the entire streaming space could use.


On Mon, Nov 16, 2015 at 8:22 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:

> All those should apply for streaming too...
>
> On Mon, Nov 16, 2015 at 11:06 AM, Vasiliki Kalavri <
> vasilikikalavri@gmail.com> wrote:
>
>> Hi,
>>
>> thanks Nick and Ovidiu for the links!
>>
>> Just to clarify, we're not looking into creating a generic streaming
>> benchmark. We have quite limited time and resources for this project. What
>> we want is to decide on a set of 3-4 _common_ streaming applications. To
>> give you an idea, for the batch workload, we will pick something like a
>> grep, one relational application, a graph algorithm, and an ML algorithm.
>>
>> Cheers,
>> -Vasia.
>>
>> On 16 November 2015 at 19:25, Ovidiu-Cristian MARCU <
>> ovidiu-cristian.marcu@inria.fr> wrote:
>>
>>> Regarding Flink vs Spark / Storm you can check here:
>>> http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark
>>>
>>> Best regards,
>>> Ovidiu
>>>
>>> On 16 Nov 2015, at 15:21, Vasiliki Kalavri <vasilikikalavri@gmail.com>
>>> wrote:
>>>
>>> Hello squirrels,
>>>
>>> with some colleagues and students here at KTH, we have started 2
>>> projects to evaluate (1) performance and (2) behavior in the presence of
>>> memory interference in cloud environments, for Flink and other systems. We
>>> want to provide our students with a workload of representative applications
>>> for testing.
>>>
>>> While for batch applications, it is quite clear to us what classes of
>>> applications are widely used and how to create a workload of different
>>> types of applications, we are not quite sure about the streaming workload.
>>>
>>> That's why, we'd like your opinions! If you're using Flink streaming in
>>> your company or your project, we'd love your input even more :-)
>>>
>>> What kind of applications would you consider as "representative" of a
>>> streaming workload? Have you run any experiments to evaluate Flink versus
>>> Spark, Storm etc.? If yes, would you mind sharing your code with us?
>>>
>>> We will of course be happy to share our results with everyone after we
>>> have completed our study.
>>>
>>> Thanks a lot!
>>> -Vasia.
>>>
>>>
>>>
>>
>

--001a113eb83ecbe4380524cde87f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hey Vasia,<div><br></div><div>I think a very common worklo=
ad would be an event stream from web servers of an online shop. Usually, th=
ese shops have multiple servers, so events arrive out of order.</div><div>I=
 think there are plenty of different use cases that you can build around th=
at data:</div><div>- Users perform different actions that a streaming syste=
m could track (analysis of click-paths),=C2=A0</div><div>- some simple stat=
istics using windows (items sold in the last 10 minutes, ..).=C2=A0</div><d=
iv>- Maybe fraud detection would be another use case.</div><div>- Often, th=
ere also needs to be a sink to HDFS or another file system for a long-term =
archive.</div><div><br></div><div>I would love to see such an event generat=
or in flink&#39;s contrib module. I think that&#39;s something the entire s=
treaming space could use.</div><div><br></div><div><br></div><div><br></div=
></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, No=
v 16, 2015 at 8:22 PM, Nick Dimiduk <span dir=3D"ltr">&lt;<a href=3D"mailto=
:ndimiduk@gmail.com" target=3D"_blank">ndimiduk@gmail.com</a>&gt;</span> wr=
ote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border=
-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">All those should ap=
ply for streaming too...</div><div class=3D"HOEnZb"><div class=3D"h5"><div =
class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Nov 16, 2015 a=
t 11:06 AM, Vasiliki Kalavri <span dir=3D"ltr">&lt;<a href=3D"mailto:vasili=
kikalavri@gmail.com" target=3D"_blank">vasilikikalavri@gmail.com</a>&gt;</s=
pan> wrote:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex=
;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div class=
=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:small;=
color:#330000">Hi,</div><div class=3D"gmail_default" style=3D"font-family:v=
erdana,sans-serif;font-size:small;color:#330000"><br></div><div class=3D"gm=
ail_default" style=3D"font-family:verdana,sans-serif;font-size:small;color:=
#330000">thanks Nick and Ovidiu for the links!</div><div class=3D"gmail_def=
ault" style=3D"font-family:verdana,sans-serif;font-size:small;color:#330000=
"><br></div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-=
serif;font-size:small;color:#330000">Just to clarify, we&#39;re not looking=
 into creating a generic streaming benchmark. We have quite limited time an=
d resources for this project. What we want is to decide on a set of 3-4 _co=
mmon_ streaming applications. To give you an idea, for the batch workload, =
we will pick something like a grep, one relational application, a graph alg=
orithm, and an ML algorithm.</div><div class=3D"gmail_default" style=3D"fon=
t-family:verdana,sans-serif;font-size:small;color:#330000"><br></div><div c=
lass=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:sm=
all;color:#330000">Cheers,</div><div class=3D"gmail_default" style=3D"font-=
family:verdana,sans-serif;font-size:small;color:#330000">-Vasia.</div></div=
><div><div><div class=3D"gmail_extra"><br><div class=3D"gmail_quote">On 16 =
November 2015 at 19:25, Ovidiu-Cristian MARCU <span dir=3D"ltr">&lt;<a href=
=3D"mailto:ovidiu-cristian.marcu@inria.fr" target=3D"_blank">ovidiu-cristia=
n.marcu@inria.fr</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote"=
 style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><d=
iv style=3D"word-wrap:break-word">Regarding Flink vs Spark / Storm you can =
check here:=C2=A0<a href=3D"http://www.sparkbigdata.com/102-spark-blog-slim=
-baltagi/14-results-of-a-benchmark-between-apache-flink-and-apache-spark" t=
arget=3D"_blank">http://www.sparkbigdata.com/102-spark-blog-slim-baltagi/14=
-results-of-a-benchmark-between-apache-flink-and-apache-spark</a><div><br><=
/div><div>Best regards,</div><div>Ovidiu</div><div><div><div><div><br><div>=
<blockquote type=3D"cite"><div>On 16 Nov 2015, at 15:21, Vasiliki Kalavri &=
lt;<a href=3D"mailto:vasilikikalavri@gmail.com" target=3D"_blank">vasilikik=
alavri@gmail.com</a>&gt; wrote:</div><br><div><div dir=3D"ltr"><div class=
=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:small;=
color:rgb(51,0,0)">Hello squirrels,</div><div class=3D"gmail_default" style=
=3D"font-family:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"><br><=
/div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif;f=
ont-size:small;color:rgb(51,0,0)">with some colleagues and students here at=
 KTH, we have started 2 projects to evaluate (1) performance and (2) behavi=
or in the presence of memory interference in cloud environments, for Flink =
and other systems. We want to provide our students with a workload of repre=
sentative applications for testing.</div><div class=3D"gmail_default" style=
=3D"font-family:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"><br><=
/div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif;f=
ont-size:small;color:rgb(51,0,0)">While for batch applications, it is quite=
 clear to us what classes of applications are widely used and how to create=
 a workload of different types of applications, we are not quite sure about=
 the streaming workload.</div><div class=3D"gmail_default" style=3D"font-fa=
mily:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"><br></div><div c=
lass=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:sm=
all;color:rgb(51,0,0)">That&#39;s why, we&#39;d like your opinions! If you&=
#39;re using Flink streaming in your company or your project, we&#39;d love=
 your input even more :-)</div><div class=3D"gmail_default" style=3D"font-f=
amily:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"><br></div><div =
class=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:s=
mall;color:rgb(51,0,0)">What kind of applications would you consider as &qu=
ot;representative&quot; of a streaming workload? Have you run any experimen=
ts to evaluate Flink versus Spark, Storm etc.? If yes, would you mind shari=
ng your code with us?</div><div class=3D"gmail_default" style=3D"font-famil=
y:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"><br></div><div clas=
s=3D"gmail_default" style=3D"font-family:verdana,sans-serif;font-size:small=
;color:rgb(51,0,0)">We will of course be happy to share our results with ev=
eryone after we have completed our study.</div><div class=3D"gmail_default"=
 style=3D"font-family:verdana,sans-serif;font-size:small;color:rgb(51,0,0)"=
><br></div><div class=3D"gmail_default" style=3D"font-family:verdana,sans-s=
erif;font-size:small;color:rgb(51,0,0)">Thanks a lot!</div><div class=3D"gm=
ail_default" style=3D"font-family:verdana,sans-serif;font-size:small;color:=
rgb(51,0,0)">-Vasia.</div></div>
</div></blockquote></div><br></div></div></div></div></div></blockquote></d=
iv><br></div>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a113eb83ecbe4380524cde87f--