incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Swapnil Ghike (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-521) Refactor Log subsystem
Date Tue, 27 Nov 2012 19:41:58 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504871#comment-13504871
] 

Swapnil Ghike commented on KAFKA-521:
-------------------------------------

1. I see, thanks for the clarification. If there are multiple compression codecs in the same
set, would it make sense to have a precedence order among them to decide which compression
codec is used for compressing all the messages together? Right now it seems that the codec
of the last compressed message will win.

2. If you are using IntelliJ, you can right click on the file name in the project structure
and click on "Optimize Imports". The unused imports that I see are 
Log:
import kafka.api.OffsetRequest
import java.util.{Comparator, Collections, ArrayList}
import scala.math._
import kafka.server.BrokerTopicStat

LogManager:
import kafka.log.Log._

3. Sure, we can talk.

4. Yes, that was a good catch. It's also less prone to introducing new bugs this way.

I am not super confident about my understanding of the non-Log* part of this patch, so it
will be good if someone else could also review that part.
                
> Refactor Log subsystem
> ----------------------
>
>                 Key: KAFKA-521
>                 URL: https://issues.apache.org/jira/browse/KAFKA-521
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>         Attachments: KAFKA-521-v1.patch, KAFKA-521-v2.patch, KAFKA-521-v3.patch
>
>
> There are a number of items it would be nice to cleanup in the log subsystem:
> 1. Misc. funky apis in Log and LogManager
> 2. Much of the functionality in Log should move into LogSegment along with corresponding
tests
> 3. We should remove SegmentList and instead use a ConcurrentSkipListMap
> The general idea of the refactoring fall into two categories. First, improve and thoroughly
document the public APIs. Second, have a clear delineation of responsibility between the various
layers:
> 1. LogManager is responsible for the creation and deletion of logs as well as the retention
of data in log segments. LogManager is the only layer aware of partitions and topics. LogManager
consists of a bunch of individual Log instances and interacts with them only through their
public API (mostly true today).
> 2. Log represents a totally ordered log. Log is responsible for reading, appending, and
truncating the log. A log consists of a bunch of LogSegments. Currently much of the functionality
in Log should move into LogSegment with Log interacting only through the Log interface. Currently
we reach around this a lot to call into FileMessageSet and OffsetIndex.
> 3. A LogSegment consists of an OffsetIndex and a FileMessageSet. It supports largely
the same APIs as Log, but now localized to a single segment.
> This cleanup will simplify testing and debugging because it will make the responsibilities
and guarantees at each layer more clear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message