kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eno Thereska (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (KAFKA-4474) Poor kafka-streams throughput
Date Thu, 30 Mar 2017 06:40:41 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-4474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eno Thereska resolved KAFKA-4474.
---------------------------------
    Resolution: Fixed

> Poor kafka-streams throughput
> -----------------------------
>
>                 Key: KAFKA-4474
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4474
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 0.10.1.0
>            Reporter: Juan Chorro
>            Assignee: Eno Thereska
>         Attachments: hctop sreenshot.png, kafka-streams-bug-1.png, kafka-streams-bug-2.png,
Performance test results.png
>
>
> Hi! 
> I'm writing because I have a worry about kafka-streams throughput.
> I have only a kafka-streams application instance that consumes from 'input' topic, prints
on the screen and produces in 'output' topic. All topics have 4 partitions. As can be observed
the topology is very simple.
> I produce 120K messages/second to 'input' topic, when I measure the 'output' topic I
detect that I'm receiving ~4K messages/second. I had next configuration (Remaining parameters
by default):
> application.id: myApp
> bootstrap.servers: localhost:9092
> zookeeper.connect: localhost:2181
> num.stream.threads: 1
> I was doing proofs and tests without success, but when I created a new 'input' topic
with 1 partition (Maintain 'output' topic with 4 partitions) I got in 'output' topic 120K
messages/seconds.
> I have been doing some performance tests and proof with next cases (All topics have 4
partitions in all cases):
> Case A - 1 Instance:
> - With num.stream.threads set to 1 I had ~3785 messages/second
> - With num.stream.threads set to 2 I had ~3938 messages/second
> - With num.stream.threads set to 4 I had ~120K messages/second
> Case B - 2 Instances:
> - With num.stream.threads set to 1 I had ~3930 messages/second for each instance (And
throughput ~8K messages/second)
> - With num.stream.threads set to 2 I had ~3945 messages/second for each instance (And
more or less same throughput that with num.stream.threads set to 1)
> Case C - 4 Instances
> - With num.stream.threads set to 1 I had 3946 messages/seconds for each instance (And
throughput ~17K messages/second):
> As can be observed when num.stream.threads is set to #partitions I have best results.
Then I have next questions:
> - Why whether I have a topic with #partitions > 1 and with num.streams.threads is
set to 1 I have ~4K messages/second always?
> - In case C. 4 instances with num.stream.threads set to 1 should be better that 1 instance
with num.stream.threads set to 4. Is corrects this supposition?
> This is the kafka-streams application that I use: https://gist.github.com/Chorro/5522ec4acd1a005eb8c9663da86f5a18



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message