kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislav Chizhov (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-6003) Replication Fetcher thread for a partition with no data fails to start
Date Fri, 06 Oct 2017 09:29:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6003?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16194347#comment-16194347

Stanislav Chizhov edited comment on KAFKA-6003 at 10/6/17 9:28 AM:

Hi [~apurva]. So now there are no plans to have that fixed in as far as I can see
from the fix version of this ticket - or are there still? Can you please shed some light on
Thank you.

was (Author: schizhov):
Hi [~apurva]. So now there are no plans to have that fixed in as far as I can see
from the fix version of this ticket. Can you please point me to a related discussion thread
somewhere or shed some light on this here?
Thank you.

> Replication Fetcher thread for a partition with no data fails to start
> ----------------------------------------------------------------------
>                 Key: KAFKA-6003
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6003
>             Project: Kafka
>          Issue Type: Bug
>          Components: replication
>    Affects Versions:
>            Reporter: Stanislav Chizhov
>            Assignee: Apurva Mehta
>            Priority: Blocker
>             Fix For: 1.0.0
> If a partition of a topic with idempotent producer has no data on 1 of the brokers, but
it does exist on others and some of the segments for this partition have been already deleted
replication thread responsible for this partition on the broker which has no data for it fails
to start with out of order sequence exception:
> {code}
> [2017-10-02 09:44:23,825] ERROR [ReplicaFetcherThread-2-4]: Error due to (kafka.server.ReplicaFetcherThread)
> kafka.common.KafkaException: error processing data for partition [stage.data.adevents.v2,20]
offset 1660336429
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:203)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:174)
>         at scala.Option.foreach(Option.scala:257)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:174)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171)
>         at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>         at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:171)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
>         at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
>         at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
>         at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:169)
>         at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
>         at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
> Caused by: org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid sequence
number for new epoch: 0 (request epoch), 154277489 (seq. number)
> {code}
> We run kafka and we ran into the situation when 1 of replication threads was
stopped for few days, while everything else on that broker was functional. This is our staging
cluster and retention is less than a day, so everything for partitions for which replication
thread was down was cleaned up. At the moment we have a broker which cannot start replication
for few partitions. I was also able to reproduce in my local test environment.
> Another possible use case when this might cause real pain is disk failure or any situation
when previously deleting all the data for the partition on a broker helped - since it would
just fetch all the data from other replicas. Now it does not work for topics with idempotent
producers. It might also affect other not-idempotent topics if those are unlucky to share
same replication fetcher thread. 
> This seems to be caused by this logic: https://github.com/apache/kafka/blob/
> and might be fixed in the scope of https://issues.apache.org/jira/browse/KAFKA-5793.
> However any hints on how to get those partition to fully replicated state are highly

This message was sent by Atlassian JIRA

View raw message