kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gustafson (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (KAFKA-6975) AdminClient.deleteRecords() may cause replicas unable to fetch from beginning
Date Thu, 14 Jun 2018 15:39:00 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jason Gustafson resolved KAFKA-6975.
       Resolution: Fixed
    Fix Version/s:     (was: 1.1.1)
                       (was: 1.0.2)

> AdminClient.deleteRecords() may cause replicas unable to fetch from beginning
> -----------------------------------------------------------------------------
>                 Key: KAFKA-6975
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6975
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 1.1.0, 1.0.1
>            Reporter: Anna Povzner
>            Assignee: Anna Povzner
>            Priority: Blocker
>             Fix For: 2.0.0
> AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to the requested
offset. If the requested offset is in the middle of the batch, the replica will not be able
to fetch from that offset (because it is in the middle of the batch). 
> One use-case where this could cause problems is replica re-assignment. Suppose we have
a topic partition with 3 initial replicas, and at some point the user issues  AdminClient.deleteRecords()
for the offset that falls in the middle of the batch. It now becomes log start offset for
this topic partition. Suppose at some later time, the user starts partition re-assignment
to 3 new replicas. The new replicas (followers) will start with HW = 0, will try to fetch
from 0, then get "out of order offset" because 0 < log start offset (LSO); the follower
will be able to reset offset to LSO of the leader and fetch LSO; the leader will send a batch
in response with base offset <LSO, this will cause "out of order offset" on the follower
which will stop the fetcher thread. The end result is that the new replicas will not be able
to start fetching unless LSO moves to an offset that is not in the middle of the batch, and
the re-assignment will be stuck for a possibly a very log time. 

This message was sent by Atlassian JIRA

View raw message