Return-Path: X-Original-To: apmail-kafka-dev-archive@www.apache.org Delivered-To: apmail-kafka-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7FAA417C2A for ; Fri, 7 Nov 2014 04:54:34 +0000 (UTC) Received: (qmail 2205 invoked by uid 500); 7 Nov 2014 04:54:34 -0000 Delivered-To: apmail-kafka-dev-archive@kafka.apache.org Received: (qmail 2072 invoked by uid 500); 7 Nov 2014 04:54:34 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 1815 invoked by uid 99); 7 Nov 2014 04:54:34 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 07 Nov 2014 04:54:34 +0000 Date: Fri, 7 Nov 2014 04:54:33 +0000 (UTC) From: "Jason Rosenberg (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (KAFKA-1758) corrupt recovery file prevents startup MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Jason Rosenberg created KAFKA-1758: -------------------------------------- Summary: corrupt recovery file prevents startup Key: KAFKA-1758 URL: https://issues.apache.org/jira/browse/KAFKA-1758 Project: Kafka Issue Type: Bug Reporter: Jason Rosenberg Hi, We recently had a kafka node go down suddenly. When it came back up, it app= arently had a corrupt recovery file, and refused to startup: {code} 2014-11-06 08:17:19,299 WARN [main] server.KafkaServer - Error starting up= KafkaServer java.lang.NumberFormatException: For input string: "^@^@^@^@^@^@^@^@^@^@^@^= @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@= ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^= @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@ ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^= @^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@= ^@^@^@^@^@^@^@^@^@^@^@" at java.lang.NumberFormatException.forInputString(NumberFormatExcep= tion.java:65) at java.lang.Integer.parseInt(Integer.java:481) at java.lang.Integer.parseInt(Integer.java:527) at scala.collection.immutable.StringLike$class.toInt(StringLike.sca= la:229) at scala.collection.immutable.StringOps.toInt(StringOps.scala:31) at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:76) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:= 106) at kafka.log.LogManager$$anonfun$loadLogs$1.apply(LogManager.scala:= 105) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOpt= imized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala= :34) at kafka.log.LogManager.loadLogs(LogManager.scala:105) at kafka.log.LogManager.(LogManager.scala:57) at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:275) at kafka.server.KafkaServer.startup(KafkaServer.scala:72) {code} And the app is under a monitor (so it was repeatedly restarting and failing= with this error for several minutes before we got to it)=E2=80=A6 We moved the =E2=80=98recovery-point-offset-checkpoint=E2=80=99 file out of= the way, and it then restarted cleanly (but of course re-synced all it=E2= =80=99s data from replicas, so we had no data loss). Anyway, I=E2=80=99m wondering if that=E2=80=99s the expected behavior? Or s= hould it not declare it corrupt and then proceed automatically to an unclea= n restart? Should this NumberFormatException be handled a bit more gracefully? We saved the corrupt file if it=E2=80=99s worth inspecting (although I doub= t it will be useful!)=E2=80=A6. The corrupt files appeared to be all zeroes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)