Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 1521B200D0E for ; Mon, 11 Sep 2017 20:42:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 139B21609B7; Mon, 11 Sep 2017 18:42:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 623391609C4 for ; Mon, 11 Sep 2017 20:42:03 +0200 (CEST) Received: (qmail 71603 invoked by uid 500); 11 Sep 2017 18:42:02 -0000 Mailing-List: contact jira-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: jira@kafka.apache.org Delivered-To: mailing list jira@kafka.apache.org Received: (qmail 71464 invoked by uid 99); 11 Sep 2017 18:42:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Sep 2017 18:42:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id F214CDB54E for ; Mon, 11 Sep 2017 18:42:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id dV2Qr-Lt4ycm for ; Mon, 11 Sep 2017 18:42:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id E86055FCB9 for ; Mon, 11 Sep 2017 18:42:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 78B2DE010F for ; Mon, 11 Sep 2017 18:42:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 323A82414B for ; Mon, 11 Sep 2017 18:42:00 +0000 (UTC) Date: Mon, 11 Sep 2017 18:42:00 +0000 (UTC) From: "Raoufeh Hashemian (JIRA)" To: jira@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (KAFKA-5857) Excessive heap usage on controller node during reassignment MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 11 Sep 2017 18:42:04 -0000 [ https://issues.apache.org/jira/browse/KAFKA-5857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161757#comment-16161757 ] Raoufeh Hashemian edited comment on KAFKA-5857 at 9/11/17 6:41 PM: ------------------------------------------------------------------- We have 960 partition for this topic. I wonder what is the relationship between memory usage of controller and number of partitions? linear ?! is 960 partitions too much for a single topic? was (Author: rsh2000): We have 960 partition for this topic, I wonder what is the relationship between memory usage of controller and number of partitions? linear ?! is 960 partitions too much for a single topic? > Excessive heap usage on controller node during reassignment > ----------------------------------------------------------- > > Key: KAFKA-5857 > URL: https://issues.apache.org/jira/browse/KAFKA-5857 > Project: Kafka > Issue Type: Bug > Components: controller > Affects Versions: 0.11.0.0 > Environment: CentOs 7, Java 1.8 > Reporter: Raoufeh Hashemian > Labels: reliability > Fix For: 1.1.0 > > Attachments: CPU.png, disk_write_x.png, memory.png, reassignment_plan.txt > > > I was trying to expand our kafka cluster of 6 broker nodes to 12 broker nodes. > Before expansion, we had a single topic with 960 partitions and a replication factor of 3. So each node had 480 partitions. The size of data in each node was 3TB . > To do the expansion, I submitted a partition reassignment plan (see attached file for the current/new assignments). The plan was optimized to minimize data movement and be rack aware. > When I submitted the plan, it took approximately 3 hours for moving data from old to new nodes to complete. After that, it started deleting source partitions (I say this based on the number of file descriptors) and rebalancing leaders which has not been successful. Meanwhile, the heap usage in the controller node started to go up with a large slope (along with long GC times) and it took 5 hours for the controller to go out of memory and another controller started to have the same behaviour for another 4 hours. At this time the zookeeper ran out of disk and the service stopped. > To recover from this condition: > 1) Removed zk logs to free up disk and restarted all 3 zk nodes > 2) Deleted /kafka/admin/reassign_partitions node from zk > 3) Had to do unclean restarts of kafka service on oom controller nodes which took 3 hours to complete . After this stage there was still 676 under replicated partitions. > 4) Do a clean restart on all 12 broker nodes. > After step 4 , number of under replicated nodes went to 0. > So I was wondering if this memory footprint from controller is expected for 1k partitions ? Did we do sth wrong or it is a bug? > Attached are some resource usage graph during this 30 hours event and the reassignment plan. I'll try to add log files as well -- This message was sent by Atlassian JIRA (v6.4.14#64029)