incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <>
Subject [jira] [Updated] (KAFKA-521) Refactor Log subsystem
Date Mon, 26 Nov 2012 20:22:59 GMT


Jay Kreps updated KAFKA-521:

    Attachment: KAFKA-521-v2.patch

Updated patch. In addition to the items in v1 This has the following changes:
1. Rebased again
2. FileMessageSet: Renamed the "limit" variable in FileMessageSet used for slicing to "end"
since it was very confusing whether this was the absolute position of the final byte in the
slice or the relative offset from the start position given (limit usually means the later).
3. FileMessageSegment, LogSegment, Log: Found a bug in LogSegment.recover(). If the message
size was corrupted it is possible for the recovery procedure to go out of memory since it
tries to load a message of the corrupt size. To fix this I now pass the max message size that
we specify in the config into the recovery procedure, and in turn into FileMessageSet.iterator,
and treat any message in the log larger than this maximum as a corruption.
4. Log: Fix a bug in Log.truncateTo--we need to delete the old segments before creating the
new segment to ensure we don't delete the new segment.
5. LogSement: Added a new optimization to LogSegment.translateOffset. We potentially do two
translations per read()--one for the startOffset and one for the end offset (if there is one).
It is possible that the nearest index entry lower bound on the end offset is actually lower
than the startOffset--potentially much lower. So in this case rather than starting the search
from this position it is better to start from the translated startOffset since it is guaranteed
to be <= endOffset. A nice special case of this is that if you fetch a single message at
a time you never do more than one message read in Log.searchFor.
6. I did an assessment of unit test coverage and added test cases where I thought there were
particularly glaring holes. Added cases covering: index rebuilding, log corruption, iterating
a FileMessageSet slice, truncating a FileMessageSet. I also expanded a few other existing
> Refactor Log subsystem
> ----------------------
>                 Key: KAFKA-521
>                 URL:
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>         Attachments: KAFKA-521-v1.patch, KAFKA-521-v2.patch
> There are a number of items it would be nice to cleanup in the log subsystem:
> 1. Misc. funky apis in Log and LogManager
> 2. Much of the functionality in Log should move into LogSegment along with corresponding
> 3. We should remove SegmentList and instead use a ConcurrentSkipListMap
> The general idea of the refactoring fall into two categories. First, improve and thoroughly
document the public APIs. Second, have a clear delineation of responsibility between the various
> 1. LogManager is responsible for the creation and deletion of logs as well as the retention
of data in log segments. LogManager is the only layer aware of partitions and topics. LogManager
consists of a bunch of individual Log instances and interacts with them only through their
public API (mostly true today).
> 2. Log represents a totally ordered log. Log is responsible for reading, appending, and
truncating the log. A log consists of a bunch of LogSegments. Currently much of the functionality
in Log should move into LogSegment with Log interacting only through the Log interface. Currently
we reach around this a lot to call into FileMessageSet and OffsetIndex.
> 3. A LogSegment consists of an OffsetIndex and a FileMessageSet. It supports largely
the same APIs as Log, but now localized to a single segment.
> This cleanup will simplify testing and debugging because it will make the responsibilities
and guarantees at each layer more clear.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message