kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gwen Shapira (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-3064) Improve resync method to waste less time and data transfer
Date Tue, 04 Oct 2016 00:09:21 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-3064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15543833#comment-15543833

Gwen Shapira commented on KAFKA-3064:

I'm wondering if the addition of throttling of replicas in will help address this

> Improve resync method to waste less time and data transfer
> ----------------------------------------------------------
>                 Key: KAFKA-3064
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3064
>             Project: Kafka
>          Issue Type: Improvement
>          Components: controller, network
>    Affects Versions:,
>            Reporter: Michael Graff
>            Assignee: Neha Narkhede
> We have several topics which are large (65 GB per partition) with 12 partitions.  Data
rates into each topic vary, but in general each one has its own rate.
> After a raid rebuild, we are pulling all the data over to the newly rebuild raid.  This
takes forever, and has yet to complete after nearly 8 hours.
> Here are my observations:
> (1)  The Kafka broker seems to pull from all topics on all partitions at the same time,
starting at the oldest message.
> (2)  When you divide total disk bandwidth available across all partitions (really, only
48 of which have significant amounts of data, about 65 * 12 = 780 GB each topic) the ingest
rate of 36 out of 48 of them is higher than the available bandwidth.
> (3)  The effect of (2) is that one topic SLOWLY catches up, while the other 4 topics
continue to retrieve data at 75% of the bandwidth, just to toss it away because the source
broker has discarded it already.
> (4)  Eventually that one topic catches up, and the remaining bandwidth is then divided
into the remaining 36 partitions, one group of which starts to catch up again.
> What I want to see is a way to say “don’t transfer more than X partitions at the
same time” and ideally a priority rule that says, “Transfer partitions you are responsible
for first, then transfer ones you are not.  Also, transfer these first, then those, but no
more than 1 topic at a time”
> What I REALLY want is for Kafka to track the new data (track the head of the log) and
then ask for the tail in chunks.  Ideally this would request from the source, “what is the
next logical older starting point?” and then start there.  This way, the transfer basically
becomes a file transfer of the log stored on the source’s disk. Once that block is retrieved,
it moves on to the next oldest.  This way, there is almost zero waste as both the head and
tail grow, but the tail runs the risk of losing the final chunk only.  Thus, bandwidth is
not significantly wasted.
> All this changes the ISR check to be is “am I caught up on head AND tail?” when the
tail part is implied right now.

This message was sent by Atlassian JIRA

View raw message