Return-Path: X-Original-To: apmail-kafka-users-archive@www.apache.org Delivered-To: apmail-kafka-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2B0B810E80 for ; Tue, 20 Aug 2013 19:38:30 +0000 (UTC) Received: (qmail 76296 invoked by uid 500); 20 Aug 2013 19:38:29 -0000 Delivered-To: apmail-kafka-users-archive@kafka.apache.org Received: (qmail 76147 invoked by uid 500); 20 Aug 2013 19:38:27 -0000 Mailing-List: contact users-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@kafka.apache.org Delivered-To: mailing list users@kafka.apache.org Received: (qmail 76131 invoked by uid 99); 20 Aug 2013 19:38:26 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Aug 2013 19:38:26 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of Bob.Bello@dish.com designates 204.76.128.101 as permitted sender) Received: from [204.76.128.101] (HELO dishnetwork.com) (204.76.128.101) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Aug 2013 19:38:21 +0000 X-TM-IMSS-Message-ID: <67a9442b00154578@dishnetwork.com> Received: from MER2-EXCHHUBA2.echostar.com ([10.3.81.20]) by dishnetwork.com ([10.3.122.26]) with ESMTP (TREND IMSS SMTP Service 7.1) id 67a9442b00154578 ; Tue, 20 Aug 2013 13:38:00 -0600 Received: from MER2-EXCH07A3.echostar.com ([10.220.8.95]) by MER2-EXCHHUBA2.echostar.com ([10.3.81.20]) with mapi; Tue, 20 Aug 2013 13:38:00 -0600 From: "Bello, Bob" To: "'users@kafka.apache.org'" CC: "Bello, Bob" Date: Tue, 20 Aug 2013 13:37:59 -0600 Subject: Possible corrupted index (Kafka 0.8) Thread-Topic: Possible corrupted index (Kafka 0.8) Thread-Index: Ac6d3MbLPB1Tfas8RreAp6pEDQXl4Q== Message-ID: <03EAA0700DF21749A1BCA520739FE43CB32A640E@MER2-EXCH07A3.echostar.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_03EAA0700DF21749A1BCA520739FE43CB32A640EMER2EXCH07A3ech_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_03EAA0700DF21749A1BCA520739FE43CB32A640EMER2EXCH07A3ech_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hello Kafka Club, We are running a July 29th git pull of 0.8 Kafka. Linux Sun JDK1.7.0_25 64b= it We have a what appears to be a corrupted index for log file. This has occur= red on a low volume topic on a single partition: - The leader Kafka broker thinks this topic is at offset: 1808 - The replication-offset-checkpoint file says that offset is 1808:= rain-burn-in 75 1808 - The replica Kafka broker has a check point offset of: 1539 It appears that the replication breaks which causes NIO bandwidth of 100-20= 0MB/s trying to replicate this topic/partition. If I use the simple console consumer to consume this topic/partition, I get= consume up to offset 1539> JAVA_HOME=3D~/jdk1.7.0_25 ./kafka-simple-consumer-shell.sh --broker-list tm= 1-kafkabroker101:9092 --partition 75 --print-offsets --topic rain-burn-in -= -skip-message-on-error | grep 'next offset' ... next offset =3D 1535 next offset =3D 1536 next offset =3D 1537 next offset =3D 1538 next offset =3D 1539 Then the simple consumer just stops (does not crash, but appears stuck). If I tell the simple consumer to start offset 1541, then the simple console= consumer can continue to consume the messages until it reaches the most cu= rrent offset. JAVA_HOME=3D~/jdk1.7.0_25 ./kafka-simple-consumer-shell.sh --broker-list tm= 1-kafkabroker101:9092 --partition 75 --print-offsets --topic rain-burn-in -= -skip-message-on-error --offset 1541 | grep 'next offset' next offset =3D 1800 next offset =3D 1801 next offset =3D 1802 next offset =3D 1803 next offset =3D 1804 next offset =3D 1805 next offset =3D 1806 next offset =3D 1807 next offset =3D 1808 >From the Kafka server.log file, I found the following errors: 2013-08-20 11:15:55 ERROR server.KafkaApis - [KafkaApi-1] Error when proces= sing fetch request for partition [rain-burn-in,75] offset 1047030 from cons= umer with correlation id 0 kafka.common.OffsetOutOfRangeException: Request for offset 1047030 but we o= nly have log segments in the range 0 to 1801. at kafka.log.Log.read(Unknown Source) at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSet(Un= known Source) at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMess= ageSets$1.apply(Unknown Source) at kafka.server.KafkaApis$$anonfun$kafka$server$KafkaApis$$readMess= ageSets$1.apply(Unknown Source) at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversabl= eLike.scala:244) at scala.collection.TraversableLike$$anonfun$map$1.apply(Traversabl= eLike.scala:244) at scala.collection.immutable.Map$Map1.foreach(Map.scala:109) at scala.collection.TraversableLike$class.map(TraversableLike.scala= :244) at scala.collection.AbstractTraversable.map(Traversable.scala:105) at kafka.server.KafkaApis.kafka$server$KafkaApis$$readMessageSets(U= nknown Source) at kafka.server.KafkaApis.handleFetchRequest(Unknown Source) at kafka.server.KafkaApis.handle(Unknown Source) at kafka.server.KafkaRequestHandler.run(Unknown Source) at java.lang.Thread.run(Thread.java:724) Is there a suggested course of action short of removing the log and index? = I was looking for documentation on the log index format (perhaps to modify/= fix) but did not find it anywhere. Thanks -Bob --_000_03EAA0700DF21749A1BCA520739FE43CB32A640EMER2EXCH07A3ech_--