kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "huxihx (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6425) Calculating cleanBytes in LogToClean might not be correct
Date Fri, 05 Jan 2018 09:40:03 GMT
huxihx created KAFKA-6425:
-----------------------------

             Summary: Calculating cleanBytes in LogToClean might not be correct
                 Key: KAFKA-6425
                 URL: https://issues.apache.org/jira/browse/KAFKA-6425
             Project: Kafka
          Issue Type: Bug
          Components: core
    Affects Versions: 1.0.0
            Reporter: huxihx


In class `LogToClean`, the calculation for `cleanBytes` is as below:
{code:java}
val cleanBytes = log.logSegments(-1, firstDirtyOffset).map(_.size.toLong).sum
{code}

Most of the time, the `firstDirtyOffset` is the base offset of active segment which works
pretty well with log.logSegments, so we can calculate the cleanBytes by safely summing up
the sizes of all log segments whose base offset is less than `firstDirtyOffset`.

However, things changed after `firstUnstableOffset` was introduced. Users could indirectly
change this offset to a non-base offset(changing log start offset for instance). In this case,
it's not correct to sum up the total size for a log segment. Instead, we should only sum up
the bytes between the base offset and `firstUnstableOffset`.

Let me show an example:
Say I have three log segments, shown as below:
0L       -->  log segment1, size: 1000Bytes
1234L -->  log segment2, size: 1000Bytes
4567L --> active log segment, current size: 500Bytes

Based on the current code, if `firstUnstableOffset` is deliberately set to 2000L(this could
be possible, since it's lower bounded by the log start offset and user could explicitly change
LSO), then `cleanBytes` is calculated as 2000Bytes which is wrong. The expected value should
be 1000 + (bytes between offset 1234L and 2000L) 

[~junrao] [~ijuma] Do all of these make sense?






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message