kafka-jira mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Onur Karaman (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (KAFKA-6134) High memory usage on controller during partition reassignment
Date Thu, 26 Oct 2017 23:29:00 GMT

    [ https://issues.apache.org/jira/browse/KAFKA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221443#comment-16221443
] 

Onur Karaman edited comment on KAFKA-6134 at 10/26/17 11:28 PM:
----------------------------------------------------------------

If you want to port a fix to 1.0 without pulling in all of KAFKA-5642, I think you can just
lazily read the reassignment state upon actually processing the PartitionReassignment instead
of providing one as part of the PartitionReassignment instance so that you'd only have one
partition reassignment mapping allocated at any point in time.


was (Author: onurkaraman):
If you want to port a fix to 1.0 without pulling in all of KAFKA-5642, I think you can just
lazily read the reassignment state upon actually processing the PartitionReassignment so that
you'd only have one partition reassignment mapping allocated at any point in time.

> High memory usage on controller during partition reassignment
> -------------------------------------------------------------
>
>                 Key: KAFKA-6134
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6134
>             Project: Kafka
>          Issue Type: Bug
>          Components: controller
>    Affects Versions: 0.11.0.0, 0.11.0.1
>            Reporter: Jason Gustafson
>            Priority: Critical
>         Attachments: Screen Shot 2017-10-26 at 3.05.40 PM.png
>
>
> We've had a couple users reporting spikes in memory usage when the controller is performing
partition reassignment in 0.11. After investigation, we found that the controller event queue
was using most of the retained memory. In particular, we found several thousand {{PartitionReassignment}}
objects, each one containing one fewer partition than the previous one (see the attached image).
> From the code, it seems clear why this is happening. We have a watch on the partition
reassignment path which adds the {{PartitionReassignment}} object to the event queue:
> {code}
>   override def handleDataChange(dataPath: String, data: Any): Unit = {
>     val partitionReassignment = ZkUtils.parsePartitionReassignmentData(data.toString)
>     eventManager.put(controller.PartitionReassignment(partitionReassignment))
>   }
> {code}
> In the {{PartitionReassignment}} event handler, we iterate through all of the partitions
in the reassignment. After we complete reassignment for each partition, we remove that partition
and update the node in zookeeper. 
> {code}
>     // remove this partition from that list
>     val updatedPartitionsBeingReassigned = partitionsBeingReassigned - topicAndPartition
>     // write the new list to zookeeper
>   zkUtils.updatePartitionReassignmentData(updatedPartitionsBeingReassigned.mapValues(_.newReplicas))
> {code}
> This triggers the handler above which adds a new event in the queue. So what you get
is an n^2 increase in memory where n is the number of partitions.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message