samza-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Riccomini <criccom...@linkedin.com>
Subject RE: About SAMZA performance
Date Mon, 12 Aug 2013 03:46:44 GMT
Hey Sean,

Thanks for the interest. We haven't done anything rigorous on the performance-testing side
of things.

Jay has a little perf test to exercise a few things, and it gets in the 280k messages/sec
range, but that's a pretty meaningless statement. He can probably speak more about what the
perf test does, how big the messages are, whether it hits the Kafka broker, etc.

As far as upcoming perf work goes, the big thing is eliminating some concurrent data structures
(queues/maps). HProf shows that this is where most (>20%) of our CPU cycles go. This can
be done once Kafka's consumer API has been cleaned up a bit, which is a work in progress (https://cwiki.apache.org/confluence/display/KAFKA/Client+Rewrite).

The theoretical max throughput we could achieve when using Kafka with Samza would be something
along the lines of the numbers in Kafka's consumer/producer performance tests, but I'm sure
we're not near that (yet). See the grid at the bottom of the page here: https://cwiki.apache.org/confluence/display/KAFKA/Performance+testing

Our largest job is currently processing about about 13 megs/sec peak spread across 5 containers,
but the need for 5 containers has more to do with memory requirements than throughput requirements,
at this point.

I'm sorry I can't be more specific, it's just been "fast enough" so far. This is something
we should take seriously. I've opened up a JIRA to track the creation of a performance test
suite:

    https://issues.apache.org/jira/browse/SAMZA-6

Feel free to add yourself as a watcher to keep tabs on progress.

Cheers,
Chris
________________________________________
From: Sean Zhong(clockfly) [clockfly@gmail.com]
Sent: Sunday, August 11, 2013 8:08 PM
To: dev@samza.incubator.apache.org
Subject: About SAMZA performance

Hi, SAMZA Developers,

Have you done performnace comparison on SAMZA? Including the Throughput and
Latency.

I am very curious to see the performance difference compared with Storm, or
spark streaming.

Sean

Mime
View raw message