kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jiangjie Qin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3565) Producer's throughput lower with compressed data after KIP-31/32
Date Sun, 08 May 2016 00:30:12 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15275431#comment-15275431
] 

Jiangjie Qin commented on KAFKA-3565:
-------------------------------------

[~junrao] I think I figured out the reason why 0.9 consumer has better performance than trunk.
It is because the recompression on the broker side in 0.9 is more efficient than the streaming
compression on the producer side.

For the setting using snappy compression, message size 100B, valuebound 500, both trunk and
0.9 reports the same batch size on the producer side.
{noformat}
Producer_Test
Select_Rate:	784.10	689.02
Batch_Size_Avg:	10625.79	10204.10
Request_Size_Avg:	85144.37	81771.16
Request_Latency_Avg:	4.41	6.77
Request_Rate:	114.30	99.33
Records_Per_Request_Avg:	801.00	801.00
Record_Queue_Time:	4.09	3.07
Compression_Rate_Avg:	0.79	0.81
92395.823709 records/sec (8.81 MB/sec), 6.52 ms avg latency, 436.00 ms max latency, 6 ms 50th,
9 ms 95th, 9 ms 99th, 17 ms 99.9th
79507.056251 records/sec (7.58 MB/sec), 8.43 ms avg latency, 220.00 ms max latency, 8 ms 50th,
11 ms 95th, 11 ms 99th, 18 ms 99.9th.
Consumer_Test
start.time, end.time, data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec end.time,
data.consumed.in.MB, MB.sec, data.consumed.in.nMsg, nMsg.sec
16:14:48:796, 16:15:07:793, 953.6743, 50.2013, 10000000, 526398.9051 
16:17:17:637, 16:17:33:701, 953.6743, 59.3672, 10000000, 622509.9602
----------------------
max.in.flight.requests.per.connection=1, valueBound=500, linger.ms=100000, messageSize=100,
compression.type=snappy
{noformat}

But after I dump the log on the broker side, after recompression the shallow messages on 0.9
broker become ~8K but while the trunk broker still has ~10K shallow message. 

I ran the tests with lz4 as well. The results is updated in test run 16 and 17. I did not
see the issue of snappy. Although after broker side recompression the sizes of the shallow
messages change a little but they are roughly the same as the producer side batch size.

I did not see this problem when value bound is 5000. So it seems the batch compression on
the broker side the better compression ratio of snappy for certain data pattern is the reason
of the performance gap we saw in the test. I listed below the batch sizes before and after
recompression for snappy with different settings:

{noformat}
Producer Batch Size Avg:           10204.49
Broker batch size:  ~8.0K
----------------------
max.in.flight.requests.per.connection=1, valueBound=500, linger.ms=100000, messageSize=100,
compression.type=gzip

Producer Batch Size Avg:           9107.23
Broker batch size: ~6.6K
----------------------
max.in.flight.requests.per.connection=1, valueBound=500, linger.ms=100000, messageSize=1000,
compression.type=snappy

Producer Batch Size Avg:           11457.56
Broker batch size: ~10.5K
----------------------
max.in.flight.requests.per.connection=1, valueBound=5000, linger.ms=100000, messageSize=100,
compression.type=snappy

Producer Batch Size Avg:           10429.08
Broker batch size: ~9.4K
----------------------
max.in.flight.requests.per.connection=1, valueBound=5000, linger.ms=100000, messageSize=1000,
compression.type=snappy
{noformat}



> Producer's throughput lower with compressed data after KIP-31/32
> ----------------------------------------------------------------
>
>                 Key: KAFKA-3565
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3565
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Ismael Juma
>            Priority: Critical
>             Fix For: 0.10.0.0
>
>
> Relative offsets were introduced by KIP-31 so that the broker does not have to recompress
data (this was previously required after offsets were assigned). The implicit assumption is
that reducing CPU usage required by recompression would mean that producer throughput for
compressed data would increase.
> However, this doesn't seem to be the case:
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    2016-04-15--012.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   59.030 seconds
> {"records_per_sec": 519418.343653, "mb_per_sec": 49.54}
> {code}
> Full results: https://gist.github.com/ijuma/0afada4ff51ad6a5ac2125714d748292
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    2016-04-15--013.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100.compression_type=snappy
> status:     PASS
> run time:   1 minute 0.243 seconds
> {"records_per_sec": 427308.818848, "mb_per_sec": 40.75}
> {code}
> Full results: https://gist.github.com/ijuma/e49430f0548c4de5691ad47696f5c87d
> The difference for the uncompressed case is smaller (and within what one would expect
given the additional size overhead caused by the timestamp field):
> {code}
> Commit: eee95228fabe1643baa016a2d49fb0a9fe2c66bd (one before KIP-31/32)
> test_id:    2016-04-15--010.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 4.176 seconds
> {"records_per_sec": 321018.17747, "mb_per_sec": 30.61}
> {code}
> Full results: https://gist.github.com/ijuma/5fec369d686751a2d84debae8f324d4f
> {code}
> Commit: fa594c811e4e329b6e7b897bce910c6772c46c0f (KIP-31/32)
> test_id:    2016-04-15--014.kafkatest.tests.benchmark_test.Benchmark.test_producer_throughput.topic=topic-replication-factor-three.security_protocol=PLAINTEXT.acks=1.message_size=100
> status:     PASS
> run time:   1 minute 5.079 seconds
> {"records_per_sec": 291777.608696, "mb_per_sec": 27.83}
> {code}
> Full results: https://gist.github.com/ijuma/1d35bd831ff9931448b0294bd9b787ed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message