incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <>
Subject [jira] [Commented] (KAFKA-521) Refactor Log subsystem
Date Tue, 27 Nov 2012 18:27:58 GMT


Jay Kreps commented on KAFKA-521:

1. The point of the logic is to recompress all the messages in the append together using one
of the given compression codecs, if there are any. This logic is a bit weird, and Neha suggested
having the server set its own compression codec on a per-topic basis irrespective of the client
codec, which actually does seem better. But in any the current purpose of that code is to
detect the compression codec used in the message set for use during recompression. The corner
case is if there are multiple compression codecs in use in the same set (totally legal). So
the logic I want is that if there are 10 gzipped messages followed by 1 uncompressed message,
I want to gzip all the messages together so I can't just reset the codec on each message,
compression (of any sort) has to override non-compression.
2. Can you be more specific, my IDE doesn't detect these. :-(
3. Not sure if I understand what you are saying, but I was a little nervous about this change.
I will follow up with you to understand better.
4. Yes. The corner case is that moving the deletion outside the synchronized block introduces
the possibility of deleting the newly created segment by accident if it has the same offset
as one of the segments being deleted. That is, say that truncate saves out a list of segments
to delete, including one with filename X and removes these from the Map, then it creates a
new segment, then it does the actual deletions. What if one of the new segments is named X?
Rather than handle that case I just do it in the lock on the assumption that truncate is going
to be rare and truncating a whole segment will be very rare. This is what that comment meant.
One the other hand deleting old segments I expect to be frequent so I tried to properly handle
that case without locking around the delete. I am not 100% confident in the solution, though.
5. Since this is basically a programmer error that should never occur I think it should be
okay either way.
> Refactor Log subsystem
> ----------------------
>                 Key: KAFKA-521
>                 URL:
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>         Attachments: KAFKA-521-v1.patch, KAFKA-521-v2.patch, KAFKA-521-v3.patch
> There are a number of items it would be nice to cleanup in the log subsystem:
> 1. Misc. funky apis in Log and LogManager
> 2. Much of the functionality in Log should move into LogSegment along with corresponding
> 3. We should remove SegmentList and instead use a ConcurrentSkipListMap
> The general idea of the refactoring fall into two categories. First, improve and thoroughly
document the public APIs. Second, have a clear delineation of responsibility between the various
> 1. LogManager is responsible for the creation and deletion of logs as well as the retention
of data in log segments. LogManager is the only layer aware of partitions and topics. LogManager
consists of a bunch of individual Log instances and interacts with them only through their
public API (mostly true today).
> 2. Log represents a totally ordered log. Log is responsible for reading, appending, and
truncating the log. A log consists of a bunch of LogSegments. Currently much of the functionality
in Log should move into LogSegment with Log interacting only through the Log interface. Currently
we reach around this a lot to call into FileMessageSet and OffsetIndex.
> 3. A LogSegment consists of an OffsetIndex and a FileMessageSet. It supports largely
the same APIs as Log, but now localized to a single segment.
> This cleanup will simplify testing and debugging because it will make the responsibilities
and guarantees at each layer more clear.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message