kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Rosenberg (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-1758) corrupt recovery file prevents startup
Date Fri, 07 Nov 2014 04:54:33 GMT
Jason Rosenberg created KAFKA-1758:

             Summary: corrupt recovery file prevents startup
                 Key: KAFKA-1758
                 URL: https://issues.apache.org/jira/browse/KAFKA-1758
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Rosenberg


We recently had a kafka node go down suddenly. When it came back up, it apparently had a corrupt
recovery file, and refused to startup:

2014-11-06 08:17:19,299  WARN [main] server.KafkaServer - Error starting up KafkaServer
java.lang.NumberFormatException: For input string: "^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:481)
        at java.lang.Integer.parseInt(Integer.java:527)
        at scala.collection.immutable.StringLike$class.toInt(StringLike.scala:229)
        at scala.collection.immutable.StringOps.toInt(StringOps.scala:31)
        at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76)
        at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:106)
        at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:105)
        at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
        at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
        at kafka.log.LogManager.loadLogs(LogManager.scala:105)
        at kafka.log.LogManager.<init>(LogManager.scala:57)
        at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:72)

And the app is under a monitor (so it was repeatedly restarting and failing with this error
for several minutes before we got to it)…

We moved the ‘recovery-point-offset-checkpoint’ file out of the way, and it then restarted
cleanly (but of course re-synced all it’s data from replicas, so we had no data loss).

Anyway, I’m wondering if that’s the expected behavior? Or should it not declare it corrupt
and then proceed automatically to an unclean restart?

Should this NumberFormatException be handled a bit more gracefully?

We saved the corrupt file if it’s worth inspecting (although I doubt it will be useful!)….

The corrupt files appeared to be all zeroes.

This message was sent by Atlassian JIRA

View raw message