kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen He (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3554) Generate actual data with specific compression ratio and add multi-thread support in the ProducerPerformance tool.
Date Mon, 26 Jun 2017 17:23:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3554?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16063452#comment-16063452
] 

Chen He commented on KAFKA-3554:
--------------------------------

Thank you for the quick reply [~becket_qin]. This work is really valuable. It provides us
a tool that can exploit kafka system's capacity. For example, we can get lowest latency by
only use 1 thread, at the same time, by increasing thread, we can find what is the maximum
throughput for a kafka cluster. 

Only one question, I did applied this patch to latest kafka and comparing results with old
ProducerPerformance.java file. I found out, if we set ack=all with snappy compression, with
100M record(100B each), it does not work as well as old PproducerPerformance.java file. 

> Generate actual data with specific compression ratio and add multi-thread support in
the ProducerPerformance tool.
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-3554
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3554
>             Project: Kafka
>          Issue Type: Improvement
>    Affects Versions: 0.9.0.1
>            Reporter: Jiangjie Qin
>            Assignee: Jiangjie Qin
>             Fix For: 0.11.1.0
>
>
> Currently the ProducerPerformance always generate the payload with same bytes. This does
not quite well to test the compressed data because the payload is extremely compressible no
matter how big the payload is.
> We can make some changes to make it more useful for compressed messages. Currently I
am generating the payload containing integer from a given range. By adjusting the range of
the integers, we can get different compression ratios. 
> API wise, we can either let user to specify the integer range or the expected compression
ratio (we will do some probing to get the corresponding range for the users)
> Besides that, in many cases, it is useful to have multiple producer threads when the
producer threads themselves are bottleneck. Admittedly people can run multiple ProducerPerformance
to achieve similar result, but it is still different from the real case when people actually
use the producer.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message