flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dominik Safaric <dominiksafa...@gmail.com>
Subject Re: benchmarking flink streaming
Date Wed, 25 Jan 2017 17:08:42 GMT
Hi Stephan,

As I’m already familiar with the latency markers of Flink 1.2, there is one question that
bothers me in regard to them - how does Flink measure end-to-end latency when dealing with
e.g. aggregations? 

Suppose you have a topology ingesting data from Kafka, and you want to output frequency per
key. In this case, the sink is just given tuples of (key: String, frequency: Int).   

> On 25 Jan 2017, at 16:11, Stephan Ewen <sewen@apache.org> wrote:
> 
> Hi!
> 
> There are new latency metrics in Flink 1.2 that you can use. They are sampled, so not
on every record.
> 
> You can always attach your own timestamps, in order to measure the latency of specific
records.
> 
> Stephan
> 
> 
> On Fri, Dec 16, 2016 at 5:02 PM, Meghashyam Sandeep V <vr1meghashyam@gmail.com <mailto:vr1meghashyam@gmail.com>>
wrote:
> Hi Stephan,
> 
> Thanks for your answer. Is there a way to get the metrics such as latency of each message
in the stream? For eg. I have a Kafka source, Cassandra  sink and I do some processing in
between. I would like to know how long does it take for each message from the beginning(entering
flink streaming from kafka) to end(sending/executing the query). 
> 
> On Fri, Dec 16, 2016 at 7:36 AM, Stephan Ewen <sewen@apache.org <mailto:sewen@apache.org>>
wrote:
> Hi!
> 
> I am not sure there exists a recommended benchmarking tool. Performance comparisons depend
heavily on the scenarios you are looking at: Simple event processing, shuffles (grouping aggregation),
joins, small state, large state, etc...
> 
> As fas as I know, most people try to write a "mock" version of a job that is representative
for the jobs they want to run, and test with that.
> 
> That said, I agree that it would actually be helpful to collect some jobs in a form of
"evaluation suite".
> 
> Stephan
> 
> 
> 
> On Thu, Dec 15, 2016 at 6:11 PM, Meghashyam Sandeep V <vr1meghashyam@gmail.com <mailto:vr1meghashyam@gmail.com>>
wrote:
> Hi There,
> 
> We are evaluating Flink streaming for real time data analysis. I have my flink job running
in EMR with Yarn. What are the possible benchmarking tools that work best with Flink? I couldn't
find this information in the Apache website. 
> 
> Thanks,
> Sandeep
> 
> 
> 


Mime
View raw message