kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Travers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-5236) Regression in on-disk log size when using Snappy compression with 0.8.2 log message format
Date Sun, 14 May 2017 21:22:04 GMT
Nick Travers created KAFKA-5236:
-----------------------------------

             Summary: Regression in on-disk log size when using Snappy compression with 0.8.2
log message format
                 Key: KAFKA-5236
                 URL: https://issues.apache.org/jira/browse/KAFKA-5236
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 0.10.2.1
            Reporter: Nick Travers


We recently upgraded our brokers in our production environments from 0.10.1.1 to 0.10.2.1
and we've noticed a sizable regression in the on-disk .log file size. For some deployments
the increase was as much as 50%.

We run our brokers with the 0.8.2 log message format version. The majority of our message
volume comes from 0.10.x Java clients sending messages encoded with the Snappy codec.

Some initial testing only shows a regression between the two versions when using Snappy compression
with a log message format of 0.8.2.

I also tested 0.10.x log message formats as well as Gzip compression. The log sizes do not
differ in this case, so the issue seems confined to 0.8.2 message format and Snappy compression.

A git-bisect lead me to this commit, which modified the server-side implementation of `Record`:
https://github.com/apache/kafka/commit/67f1e5b91bf073151ff57d5d656693e385726697

Here's the PR, which has more context:
https://github.com/apache/kafka/pull/2140

Here is a link to the test I used to re-producer this issue:
https://github.com/nicktrav/kafka/commit/68e8db4fa525e173651ac740edb270b0d90b8818

cc: [~hachikuji] [~junrao] [~ijuma] [~guozhang] (tagged on original PR)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message