kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislav Chizhov (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6003) Replication Fetcher thread for a partition with no data fails to start
Date Mon, 02 Oct 2017 19:54:00 GMT
Stanislav Chizhov created KAFKA-6003:

             Summary: Replication Fetcher thread for a partition with no data fails to start
                 Key: KAFKA-6003
                 URL: https://issues.apache.org/jira/browse/KAFKA-6003
             Project: Kafka
          Issue Type: Bug
          Components: replication
    Affects Versions:
            Reporter: Stanislav Chizhov

If a partition of a topic with idempotent producer has no data on 1 of the brokers, but it
does exist on others and some of the segments for this partition have been already deleted
replication thread responsible for this partition on the broker which has no data for it fails
to start with out of order sequence exception:
[2017-10-02 09:44:23,825] ERROR [ReplicaFetcherThread-2-4]: Error due to (kafka.server.ReplicaFetcherThread)
kafka.common.KafkaException: error processing data for partition [stage.data.adevents.v2,20]
offset 1660336429
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:203)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2.apply(AbstractFetcherThread.scala:174)
        at scala.Option.foreach(Option.scala:257)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:174)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1.apply(AbstractFetcherThread.scala:171)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply$mcV$sp(AbstractFetcherThread.scala:171)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
        at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2.apply(AbstractFetcherThread.scala:171)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
        at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:169)
        at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:112)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:64)
Caused by: org.apache.kafka.common.errors.OutOfOrderSequenceException: Invalid sequence number
for new epoch: 0 (request epoch), 154277489 (seq. number)
We run kafka and we ran into the situation when 1 of replication threads was stopped
for few days, while everything else on that broker was functional. This is our staging cluster
and retention is less than a day, so at the moment we have a broker which cannot start replication
for few partition. I was also able to reproduce in my local test environment.
Another possible use case is disk failure or any situation when previously deleting all the
data for the partition on a broker helped - since it would just fetch all the data from other
replicas. Now it does not work for topics with idempotent producers. It might also affect
other not-idempotent topics if those are unlucky to share same replication fetcher thread.

This seems to be caused by this logic: https://github.com/apache/kafka/blob/

and might be fixed in the scope of https://issues.apache.org/jira/browse/KAFKA-5793.

However any hints on how to get those partition to fully replicated state are highly appreciated.
Any hints on how to get this broker 

This message was sent by Atlassian JIRA

View raw message