kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dong Lin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (KAFKA-6636) ReplicaFetcherThread should not die if hw < 0
Date Mon, 12 Mar 2018 20:44:02 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Dong Lin updated KAFKA-6636:
----------------------------
    Description: 
ReplicaFetcherThread can die in the following scenario:

 

1) Partition P1 has replica set size 1. Broker A is the leader. The segment is empty and log
start offset is 100

2) User executes partition reassignment to change replica set from \{A} to \{B, C}

3) Broker B starts ReplicaFetcherThread, which triggers handleOffsetOutOfRange(), truncates
the log fully and start at offset 100. At this moment its high watermark is still 0 (or -1). Same
for broker C.

4) Broker B sends FetchRequest to A at offset 100, broker A immediately adds broker B to ISR
set, and controller moves leadership to broker B.

5) Broker B handles LeaderAndIsrRequest to become leader. It calls `leaderReplica.convertHWToLocalOffsetMetadata()`
to initialize its HW. Since its HW was smaller than logStartOffset=100, now its HW will be
overridden to LogOffsetMetadata.UnknownOffsetMetadata, i.e. -1.

6) Broker C handles LeaderAndIsrRequest to fetch from broker B. Broker C updates its HW to
the FetchRequest's HW, i.e. -1. Then broker C calls replica.maybeIncrementLogStartOffset(leaderLogStartOffset)
where leaderLogStartOffset=100. This cause exception because leaderLogStartOffset > HW.
This is an unhandled exception and thus the ReplicaFetcherThread will exit

> ReplicaFetcherThread should not die if hw < 0
> ---------------------------------------------
>
>                 Key: KAFKA-6636
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6636
>             Project: Kafka
>          Issue Type: Improvement
>            Reporter: Dong Lin
>            Assignee: Dong Lin
>            Priority: Major
>
> ReplicaFetcherThread can die in the following scenario:
>  
> 1) Partition P1 has replica set size 1. Broker A is the leader. The segment is empty
and log start offset is 100
> 2) User executes partition reassignment to change replica set from \{A} to \{B, C}
> 3) Broker B starts ReplicaFetcherThread, which triggers handleOffsetOutOfRange(), truncates
the log fully and start at offset 100. At this moment its high watermark is still 0 (or -1). Same
for broker C.
> 4) Broker B sends FetchRequest to A at offset 100, broker A immediately adds broker B
to ISR set, and controller moves leadership to broker B.
> 5) Broker B handles LeaderAndIsrRequest to become leader. It calls `leaderReplica.convertHWToLocalOffsetMetadata()`
to initialize its HW. Since its HW was smaller than logStartOffset=100, now its HW will be
overridden to LogOffsetMetadata.UnknownOffsetMetadata, i.e. -1.
> 6) Broker C handles LeaderAndIsrRequest to fetch from broker B. Broker C updates its
HW to the FetchRequest's HW, i.e. -1. Then broker C calls replica.maybeIncrementLogStartOffset(leaderLogStartOffset)
where leaderLogStartOffset=100. This cause exception because leaderLogStartOffset > HW.
This is an unhandled exception and thus the ReplicaFetcherThread will exit



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message