kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ivan Babrou (JIRA)" <j...@apache.org>
Subject [jira] [Created] (KAFKA-6414) Inverse replication for replicas that are far behind
Date Tue, 02 Jan 2018 02:14:00 GMT
Ivan Babrou created KAFKA-6414:
----------------------------------

             Summary: Inverse replication for replicas that are far behind
                 Key: KAFKA-6414
                 URL: https://issues.apache.org/jira/browse/KAFKA-6414
             Project: Kafka
          Issue Type: Bug
          Components: replication
            Reporter: Ivan Babrou


Let's suppose the following starting point:

* 1 topic
* 1 partition
* 1 reader
* 24h retention period
* leader outbound bandwidth is 3x of inbound bandwidth (1x replication + 1x reader + 1x slack
= total outbound)

In this scenario, when replica fails and needs to be brought back from scratch, you can catch
up at 2x inbound bandwidth (1x regular replication + 1x slack used).

2x catch-up speed means replica will be at the point where leader is now in 24h / 2x = 12h.
However, in 12h the oldest 12h of the topic will fall out of retention cliff and will be deleted.
There's absolutely to use for this data, it will never be read from the replica in any scenario.
And this not even including the fact that we still need to replicate 12h more of data that
accumulated since the time we started.

My suggestion is to refill sufficiently out of sync replicas backwards from the tip: newest
segments first, oldest segments last. Then we can stop when we hit retention cliff and replicate
far less data. The lower the ratio of catch-up bandwidth to inbound bandwidth, the higher
the returns would be. This will also set a hard cap on retention time: it will be no higher
than retention period if catch-up speed if >1x (if it's less, you're forever out of ISR
anyway).

What exactly "sufficiently out of sync" means in terms of lag is a topic for a debate. The
default segment size is 1GiB, I'd say that being >1 full segments behind probably warrants
this.

As of now, the solution for slow recovery appears to be to reduce retention to speed up recovery,
which doesn't seem very friendly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message