kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gustafson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-5634) Replica fetcher thread crashes due to OffsetOutOfRangeException
Date Tue, 25 Jul 2017 17:15:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-5634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16100381#comment-16100381
] 

Jason Gustafson commented on KAFKA-5634:
----------------------------------------

I can confirm that this bug occurs when a fetch response contains a high watermark lower than
the log start offset. It is easily reproducible by creating a replicated topic configured
with compact+delete and a low retention value, and writing data older than the retention value
quickly from a producer. Here is the topic command I used:
{code}
bin/kafka-topics.sh --create --topic foo --replication-factor 2 --partitions 1 --config retention.ms=60000
--config cleanup.policy=compact,delete --zookeeper localhost:2181 
{code}
Then I did a simple producer loop which wrote records with timestamps from a day ago. Almost
immediately I saw the exception and was able to confirm the contents of the fetch response
data. 

> Replica fetcher thread crashes due to OffsetOutOfRangeException
> ---------------------------------------------------------------
>
>                 Key: KAFKA-5634
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5634
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.11.0.0
>            Reporter: Jason Gustafson
>            Assignee: Jason Gustafson
>            Priority: Critical
>              Labels: regression, reliability
>             Fix For: 0.11.0.1
>
>
> We have seen the following exception recently:
> {code}
> kafka.common.KafkaException: error processing data for partition [foo,0] offset 1459250
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:203)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:174)
>         at scala.Option.foreach(Option.scala:257)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:174)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:171)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
>         at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:169)
>         at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: The specified offset
1459250 is higher than the high watermark 1459032 of the partition foo-0
> {code}
> The error check was added in the patch for KIP-107: https://github.com/apache/kafka/commit/8b05ad406d4cba6a75d1683b6d8699c3ab28f9d6.
After investigation, we found that it is possible for the log start offset on the leader to
get ahead of the high watermark on the follower after segment deletion. The check therefore
seems incorrect. The impact of this bug is that the fetcher thread crashes on the follower
and the broker must be restarted.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message