kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Gustafson (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-7128) Lagging high watermark can lead to committed data loss after ISR expansion
Date Mon, 02 Jul 2018 19:59:00 GMT
Jason Gustafson created KAFKA-7128:

             Summary: Lagging high watermark can lead to committed data loss after ISR expansion
                 Key: KAFKA-7128
                 URL: https://issues.apache.org/jira/browse/KAFKA-7128
             Project: Kafka
          Issue Type: Bug
            Reporter: Jason Gustafson
            Assignee: Jason Gustafson

Some model checking exposed a weakness in the ISR expansion logic. We know that the high watermark
can go backwards after a leader failover, but we may not have known that this can lead to
the loss of committed data. 

Say we have three replicas: r1, r2, and r3. Initially, the ISR consists of (r1, r2) and the
leader is r1. r3 is a new replica which has not begun fetching. The data up to offset 10 has
been committed to the ISR. Here is the initial state:

ISR: (r1, r2)
Leader: r1
r1: [hw=10, leo=10]
r2: [hw=5, leo=10]
r3: [hw=0, leo=0]

Replica 1 then initiates shutdown (or fails) and leaves the ISR, which makes r2 the new leader.
The high watermark is still lagging r1.

ISR: (r2)
Leader: r2
r1 (offline): [hw=10, leo=10]
r2: [hw=5, leo=10]
r3: [hw=0, leo=0]

Replica 3 then catch up to the high watermark on r2 and joins the ISR. Perhaps it's high watermark
is lagging behind r2, but this is unimportant.

ISR: (r2, r3)
Leader: r2
r1 (offline): [hw=10, leo=10]
r2: [hw=5, leo=10]
r3: [hw=0, leo=5]

Now r2 fails and r3 is elected leader and is the only member of the ISR. The committed data
from offsets 5 to 10 has been lost.

ISR: (r3)
Leader: r3
r1 (offline): [hw=10, leo=10]
r2 (offline): [hw=5, leo=10]
r3: [hw=0, leo=5]

The bug is the fact that we allowed r3 into the ISR after the local high watermark had been
reached. Since the follower does not know the true high watermark for the previous leader's
epoch, it should not allow a replica to join the ISR until it has caught up to an offset within
its own epoch. 

Note this is related to https://cwiki.apache.org/confluence/display/KAFKA/KIP-207%3A+Offsets+returned+by+ListOffsetsResponse+should+be+monotonically+increasing+even+during+a+partition+leader+change

This message was sent by Atlassian JIRA

View raw message