kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-750) inconsistent index offset during broker startup
Date Thu, 07 Feb 2013 22:33:12 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13573996#comment-13573996

Jay Kreps commented on KAFKA-750:

Are you sure this is a clean shutdown? Do we have the log from the broker? Without that it
is pretty hard to figure out.

For issue (1), the lastOffset == 0:

Seeing last offset = 0 should happen in the even of an unclean shutdown. It is not possible
to append an actual 0 to the index, we guard against that. If the segment was from an unclean
shutdown this is expected and gets repaired when we run recovery on the segment.

I checked the logic for clean shutdown. The only time we create a clean shutdown file is logmanager.shutdown:
  def shutdown() {
    debug("Shutting down.")
    try {
      // close the logs
      // mark that the shutdown was clean by creating the clean shutdown marker file
      logDirs.foreach(dir => Utils.swallow(new File(dir, CleanShutdownFile).createNewFile()))
    } finally {
      // regardless of whether the close succeeded, we need to unlock the data directories
    debug("Shutdown complete.")
I don't think this can fail to call close on the index (which truncates the file) and still
write an index file. The close() call is indeed truncating the index.

There is an issue here which is that our resize() call does not call flush after truncating
the file. This means that a hard OS crash after a clean shutdown could lead to a corrupt index
on disk (the truncated file bits could re-appear) but also a clean shutdown file. This is
a fairly unlikely problem, though, as it requires a hard OS crash to coincide with a clean
shutdown. I don't think that happened here.

That leaves the possibility that the size is somehow getting out of whack with the position
in the index. This can be modified in truncateTo or append, and both seem to correctly manage
the size and position.

The second issue of maxEntries vs maxIndexSize is even more curious. The maxIndexSize is a
configuration parameter so it takes whatever is configured at startup time. The maxEntries
is set to file_size/8. So if the file was newly allocated this would be impossible because
it would definitely be set exactly. In the event of a clean shutdown any value is possible.
The weird thing here is that this value happens to be exactly half the maxEntries value. This
would be a remarkable coincidence. I cannot explain this.

> inconsistent index offset during broker startup
> -----------------------------------------------
>                 Key: KAFKA-750
>                 URL: https://issues.apache.org/jira/browse/KAFKA-750
>             Project: Kafka
>          Issue Type: Bug
>          Components: log
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>            Assignee: Jay Kreps
>            Priority: Blocker
>              Labels: bugs, p1
> Saw the following log during a clean restart of a broker.
> 2013/01/29 19:18:12.073 INFO [FileMessageSet] [main] [kafka] []  Creating or reloading
log segment /export/content/kafka/i001_caches/topic1-3/00000000000000000000.log2013/01/29
19:18:12.074 INFO [OffsetIndex] [main] [kafka] []  Created index file /export/content/kafka/i001_caches/topic1-3/00000000000000000000.index
with maxEntries = 65
> 5360, maxIndexSize = 10485760, entries = 655360, lastOffset = 0
> A couple of things are weird.
> 1. There are entries in the index, but lastOffset is 0.
> 2 maxIndexSize/manxEntries = 16, instead of 8.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message