kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GEORGE LI (Jira)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-8903) allow the new replica (offset 0) to catch up with current leader using latest offset
Date Thu, 12 Sep 2019 21:43:00 GMT
GEORGE LI created KAFKA-8903:

             Summary: allow the new replica (offset 0) to catch up with current leader using
latest offset
                 Key: KAFKA-8903
                 URL: https://issues.apache.org/jira/browse/KAFKA-8903
             Project: Kafka
          Issue Type: Improvement
          Components: config, core
    Affects Versions: 2.3.0, 1.1.1, 1.1.0
            Reporter: GEORGE LI
            Assignee: GEORGE LI

It very common (and sometimes frequent) that a broker has hardware failures (disk, memory,
cpu, nic) for large Kafka deployment with thousands of brokers.  The failed host will be replaced
by a new one with the same "broker.id",  and the new broker starts up as empty.  All topic/partitions
will start with offset 0.  If the current leader has start offset > 0,  this replaced broker
will start the partition from the leader's earliest (start) offset. 

If the number of partitions  and size of the partitions that this broker is hosting is high,
it would take quite sometime for the ReplicaFetcher threads to pull from all the leaders in
the cluster.  and it could incur load of the brokers/leaders in the cluster affecting Latency,
etc.  performance.   Once this replaced broker is caught up,  Preferred leader election can
be run to move the leaders back to this broker. 

To avoid above performance impact and make the failed broker replacement process much easier
and scalable,  we are proposing a new Dynamic config {{ replica.start.offset.strategy}}. 
The default is Earliest, and can be dynamically set for a broker (or brokers) to Latest. 
If it's set to Latest,  when the empty broker is starting up, all partitions will be starting
from latest (LEO LogEndOffset) of the current leader.  So the replace broker replicas are
in ISR and have 0 TotalLag/MaxLag, 0 URP almost instantly. 

For preferred leadership election, we can wait till the retention time has passed, and this
replaced broker is in the replication for enough time.  The better/safer approach is enable
Preferred Leader Blacklist  mentioned in  KAFKA-8638 /  KIP-491  ,  so before this replaced
broker is completely caught up,  it's leadership determination priority is moved to the lowest.

This message was sent by Atlassian Jira

View raw message