Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4666A200D3B for ; Fri, 27 Oct 2017 07:11:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 44F2E160BF3; Fri, 27 Oct 2017 05:11:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8B3DB1609E5 for ; Fri, 27 Oct 2017 07:11:32 +0200 (CEST) Received: (qmail 46861 invoked by uid 500); 27 Oct 2017 05:11:31 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 46849 invoked by uid 99); 27 Oct 2017 05:11:31 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Oct 2017 05:11:31 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id C4DA1180847 for ; Fri, 27 Oct 2017 05:11:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id s71RO2FZmQwO for ; Fri, 27 Oct 2017 05:11:29 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 37E515FDD4 for ; Fri, 27 Oct 2017 05:11:29 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 05CB1E00A3 for ; Fri, 27 Oct 2017 05:11:03 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 66121212F5 for ; Fri, 27 Oct 2017 05:11:01 +0000 (UTC) Date: Fri, 27 Oct 2017 05:11:00 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-6134) High memory usage on controller during partition reassignment MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 27 Oct 2017 05:11:33 -0000 [ https://issues.apache.org/jira/browse/KAFKA-6134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16221718#comment-16221718 ] ASF GitHub Bot commented on KAFKA-6134: --------------------------------------- Github user hachikuji closed the pull request at: https://github.com/apache/kafka/pull/4141 > High memory usage on controller during partition reassignment > ------------------------------------------------------------- > > Key: KAFKA-6134 > URL: https://issues.apache.org/jira/browse/KAFKA-6134 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.11.0.0, 0.11.0.1 > Reporter: Jason Gustafson > Assignee: Jason Gustafson > Priority: Critical > Labels: regression > Fix For: 1.0.0, 0.11.0.2 > > Attachments: Screen Shot 2017-10-26 at 3.05.40 PM.png > > > We've had a couple users reporting spikes in memory usage when the controller is performing partition reassignment in 0.11. After investigation, we found that the controller event queue was using most of the retained memory. In particular, we found several thousand {{PartitionReassignment}} objects, each one containing one fewer partition than the previous one (see the attached image). > From the code, it seems clear why this is happening. We have a watch on the partition reassignment path which adds the {{PartitionReassignment}} object to the event queue: > {code} > override def handleDataChange(dataPath: String, data: Any): Unit = { > val partitionReassignment = ZkUtils.parsePartitionReassignmentData(data.toString) > eventManager.put(controller.PartitionReassignment(partitionReassignment)) > } > {code} > In the {{PartitionReassignment}} event handler, we iterate through all of the partitions in the reassignment. After we complete reassignment for each partition, we remove that partition and update the node in zookeeper. > {code} > // remove this partition from that list > val updatedPartitionsBeingReassigned = partitionsBeingReassigned - topicAndPartition > // write the new list to zookeeper > zkUtils.updatePartitionReassignmentData(updatedPartitionsBeingReassigned.mapValues(_.newReplicas)) > {code} > This triggers the handler above which adds a new event in the queue. So what you get is an n^2 increase in memory where n is the number of partitions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)