incubator-kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-521) Refactor Log subsystem
Date Wed, 28 Nov 2012 06:02:58 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jay Kreps updated KAFKA-521:
----------------------------

    Attachment: KAFKA-521-v5.patch

New patch. Addresses your comments (see below) and also one more refactoring:
1. Swap argument order in Log.read. Now it is read(start, end, size), which makes a lot more
sense. The odd order was originally due to default args which I ended up not using anyway.

Your comments:
1. Ack yes this test is totally messed up. Fixed.
2. I clarified a bit. It is technically an upper bound on the end position.
3.1, 3.2. Agreed, but since we had a follow-up item to generalize that API I tried not to
touch it. There is already a lot in this patch. Basically there are so many improvements we
could make there that there is no point making just one or two. The performance issue shouldn't
be too pressing in any case.
4.1. Done, did this throughout. Would be nice to do this globally, actually. We don't need
to be dogmatic, but it is worth thinking about it.
4.2 Fixed.
4.3, 4.4. Agreed. Filed KAFKA-636 to cover all delete related issues. I will probably do that
next, but I don't think we want to mix it in with this one.
4.5. Are you refering to segments.firstEntry.getValue.baseOffset? It is an invariant of the
whole class that segments be non-empty. I can check it but the resulting exception will not
be more informative, I think, and there is no graceful handling of this. The important thing
is to validate that this invariant actually holds...
4.6 Ditto.
5.1. No, I check that. The check is (segment.size > 0 && time.milliseconds - segment.created
> rollIntervalMs). This makes more sense. We had that first append time that we weirdly
reset at times and were using that to infer that the segment was non-empty, it was convoluted.
This is more straight forward: (1) a create time based on when the segment is instantiated
(still not quite right in the case of restart), and (2) a roll criteria that says, in effect,
"roll non-empty segments created more than this threshold". 
5.2 Deleted.
6. Yes. The rationale is that retries lead to duplicates which breaks our normal guarantee.
I think we should default to very high timeouts and no retries and let the user tune appropriately
after thinking about what they actually want. Arguably this should be done on a separate ticket.
                
> Refactor Log subsystem
> ----------------------
>
>                 Key: KAFKA-521
>                 URL: https://issues.apache.org/jira/browse/KAFKA-521
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Jay Kreps
>         Attachments: KAFKA-521-v1.patch, KAFKA-521-v2.patch, KAFKA-521-v3.patch, KAFKA-521-v4.patch,
KAFKA-521-v5.patch
>
>
> There are a number of items it would be nice to cleanup in the log subsystem:
> 1. Misc. funky apis in Log and LogManager
> 2. Much of the functionality in Log should move into LogSegment along with corresponding
tests
> 3. We should remove SegmentList and instead use a ConcurrentSkipListMap
> The general idea of the refactoring fall into two categories. First, improve and thoroughly
document the public APIs. Second, have a clear delineation of responsibility between the various
layers:
> 1. LogManager is responsible for the creation and deletion of logs as well as the retention
of data in log segments. LogManager is the only layer aware of partitions and topics. LogManager
consists of a bunch of individual Log instances and interacts with them only through their
public API (mostly true today).
> 2. Log represents a totally ordered log. Log is responsible for reading, appending, and
truncating the log. A log consists of a bunch of LogSegments. Currently much of the functionality
in Log should move into LogSegment with Log interacting only through the Log interface. Currently
we reach around this a lot to call into FileMessageSet and OffsetIndex.
> 3. A LogSegment consists of an OffsetIndex and a FileMessageSet. It supports largely
the same APIs as Log, but now localized to a single segment.
> This cleanup will simplify testing and debugging because it will make the responsibilities
and guarantees at each layer more clear.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message