kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Bellemare <adam.bellem...@gmail.com>
Subject Re: Current Kafka Steams and KSQL Performance Metrics / Benchmarks?
Date Thu, 23 Aug 2018 12:11:37 GMT
Thanks Guozhang!

I am asking primarily because I have seen Flink & Spark Streaming users
boasting of millions of records / second being processed and was interested
to learn where Kafka Streams / KSQL stands. This would also help a lot in
capability planning for teams looking to use Kafka Streams. Is there a
general rule of thumb for upper performance on a Kafka Streams app, for
say, single stream to table join?


As for KIP-213... I need something else to start taking a look at while it
awaits review :).

Thanks, I'll look into what you posted.

Adam


On Wed, Aug 22, 2018 at 7:42 PM, Guozhang Wang <wangguoz@gmail.com> wrote:

> Hello Adam,
>
> Thanks for your interests in working on Kafka Streams / KSQL potential
> performance improvements (I thought the non-key joining will take most of
> your time :P )
>
> Currently there is no published performance numbers for latest versions of
> Streams AFAIK. Personally I ran the Streams SimpleBenchmark (
> https://github.com/apache/kafka/blob/trunk/tests/
> kafkatest/benchmarks/streams/streams_simple_benchmark_test.py) and profile
> it if necessary trying to figure out the performance bottlenecks. If you
> are interested you can follow similar approaches, there are also some JIRAs
> open for potential performance improvements as well:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20KAFKA%20AND%20status%20in%20(Open%2C%20%22In%
> 20Progress%22%2C%20Reopened%2C%20%22Patch%20Available%22)%
> 20AND%20component%20%3D%20%22streams%22%20%20AND%
> 20labels%20%3D%20performance%20%20
>
>
> Guozhang
>
> On Wed, Aug 22, 2018 at 7:02 AM, Adam Bellemare <adam.bellemare@gmail.com>
> wrote:
>
> > Blog post in question:
> > https://www.confluent.io/blog/ksql-february-release-streamin
> > g-sql-for-apache-kafka/
> >
> > On Wed, Aug 22, 2018 at 10:01 AM, Adam Bellemare <
> adam.bellemare@gmail.com
> > >
> > wrote:
> >
> > > Hi All
> > >
> > > I am looking for performance metrics related to Kafka Streams and
> KSQL. I
> > > have been scouring various blogs, including the confluent one, looking
> > for
> > > any current performance metrics or benchmarks, official or otherwise,
> on
> > > both Kafka Streams and KSQL for Kafka 2.x +. Unfortunately, almost
> > > everything I am finding is 0.x.
> > >
> > > In this particular blog post on KSQL, there is the following quotation:
> > >
> > > > For example, our soak testing cluster has racked up over 1,000 hours
> > > and runs KSQL workloads 24×7. The performance tests we conduct allow us
> > to
> > > understand performance characteristics of stateless and stateful KSQL
> > > queries. We currently run over 42 different tests that collect more
> than
> > > 700 metrics.
> > >
> > > I assume that there is also some information related to Kafka Streams
> in
> > > similar tests. Does anyone know where I can find these results? Or does
> > > anyone have any blog posts or other materials that look at the
> > performance
> > > of either one of these for Kafka 2.x ?
> > >
> > > For context, I am asking this question to get a better understanding of
> > > current Kafka Streams / KSQL performance, such that contributors can
> > > understand the prioritization of performance-related improvements vs.
> > > feature-related improvements.
> > >
> > > Thanks
> > > Adam
> > >
> >
>
>
>
> --
> -- Guozhang
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message