kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kyle Ambroff-Kao (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (KAFKA-6469) ISR change notification queue can prevent controller from making progress
Date Tue, 23 Jan 2018 04:16:01 GMT

     [ https://issues.apache.org/jira/browse/KAFKA-6469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Kyle Ambroff-Kao reassigned KAFKA-6469:
---------------------------------------

    Assignee: Kyle Ambroff-Kao

> ISR change notification queue can prevent controller from making progress
> -------------------------------------------------------------------------
>
>                 Key: KAFKA-6469
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6469
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Kyle Ambroff-Kao
>            Assignee: Kyle Ambroff-Kao
>            Priority: Major
>
> When the writes /isr_change_notification in ZooKeeper (which is effectively a queue
of ISR change events for the controller) happen at a rate high enough that the node with a
watch can't dequeue them, the trouble starts.
> The watcher kafka.controller.IsrChangeNotificationListener is fired in the controller
when a new entry is written to /isr_change_notification, and the zkclient library sends a
GetChildrenRequest to zookeeper to fetch all child znodes.
> We've seen this happen in one of our test clusters as the partition count started to
climb north of 60k per broker. We had brokers writing child nodes under /isr_change_notification
that were larger than the jute.maxbuffer size in ZooKeeper (1MB), causing the ZooKeeper server
to drop the controller's session, effectively bricking the cluster.
> This can be partially mitigated by chunking ISR notifications to increase the maximum
number of partitions a broker can host.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message