kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Widman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KAFKA-4084) automated leader rebalance causes replication downtime for clusters with too many partitions
Date Mon, 30 Oct 2017 22:36:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-4084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16225870#comment-16225870

Jeff Widman commented on KAFKA-4084:

[~junrao] any ballpark quantification to "much faster"? 

Are we talking 2x, 10x, or 100x faster?

When you say "batches the requests", I'm not sure what the batch size is... if it does all
changes as a single batch or if there's multiple batches... so it's hard to guesstimate the
expected performance impact.

> automated leader rebalance causes replication downtime for clusters with too many partitions
> --------------------------------------------------------------------------------------------
>                 Key: KAFKA-4084
>                 URL: https://issues.apache.org/jira/browse/KAFKA-4084
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions:,,,,
>            Reporter: Tom Crayford
>              Labels: reliability
>             Fix For: 1.1.0
> If you enable {{auto.leader.rebalance.enable}} (which is on by default), and you have
a cluster with many partitions, there is a severe amount of replication downtime following
a restart. This causes `UnderReplicatedPartitions` to fire, and replication is paused.
> This is because the current automated leader rebalance mechanism changes leaders for
*all* imbalanced partitions at once, instead of doing it gradually. This effectively stops
all replica fetchers in the cluster (assuming there are enough imbalanced partitions), and
restarts them. This can take minutes on busy clusters, during which no replication is happening
and user data is at risk. Clients with {{acks=-1}} also see issues at this time, because replication
is effectively stalled.
> To quote Todd Palino from the mailing list:
> bq. There is an admin CLI command to trigger the preferred replica election manually.
There is also a broker configuration “auto.leader.rebalance.enable” which you can set
to have the broker automatically perform the PLE when needed. DO NOT USE THIS OPTION. There
are serious performance issues when doing so, especially on larger clusters. It needs some
development work that has not been fully identified yet.
> This setting is extremely useful for smaller clusters, but with high partition counts
causes the huge issues stated above.
> One potential fix could be adding a new configuration for the number of partitions to
do automated leader rebalancing for at once, and *stop* once that number of leader rebalances
are in flight, until they're done. There may be better mechanisms, and I'd love to hear if
anybody has any ideas.

This message was sent by Atlassian JIRA

View raw message