Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AE055200C26 for ; Sat, 25 Feb 2017 22:46:59 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id AC9D3160B5D; Sat, 25 Feb 2017 21:46:59 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F415A160B50 for ; Sat, 25 Feb 2017 22:46:58 +0100 (CET) Received: (qmail 97659 invoked by uid 500); 25 Feb 2017 21:46:53 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 97646 invoked by uid 99); 25 Feb 2017 21:46:53 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Feb 2017 21:46:53 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 6BA9FC047D for ; Sat, 25 Feb 2017 21:46:52 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id YtrvExhuunaD for ; Sat, 25 Feb 2017 21:46:51 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 2A2C960E17 for ; Sat, 25 Feb 2017 21:46:51 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 7EDDFE05B1 for ; Sat, 25 Feb 2017 21:46:45 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 338B124142 for ; Sat, 25 Feb 2017 21:46:44 +0000 (UTC) Date: Sat, 25 Feb 2017 21:46:44 +0000 (UTC) From: "Jiangjie Qin (JIRA)" To: dev@kafka.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (KAFKA-3436) Speed up controlled shutdown. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 25 Feb 2017 21:46:59 -0000 [ https://issues.apache.org/jira/browse/KAFKA-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15884390#comment-15884390 ] Jiangjie Qin commented on KAFKA-3436: ------------------------------------- [~onurkaraman] is currently working on rewrite controller. The latest trunk already has some controlled shutdown performance improvement by batching the partitions. Have you got a chance to try? > Speed up controlled shutdown. > ----------------------------- > > Key: KAFKA-3436 > URL: https://issues.apache.org/jira/browse/KAFKA-3436 > Project: Kafka > Issue Type: Improvement > Affects Versions: 0.9.0.0 > Reporter: Jiangjie Qin > Assignee: Jiangjie Qin > Fix For: 0.10.3.0 > > > Currently rolling bounce a Kafka cluster with tens of thousands of partitions can take very long (~2 min for each broker with ~5000 partitions/broker in our environment). The majority of the time is spent on shutting down a broker. The time of shutting down a broker usually includes the following parts: > T1: During controlled shutdown, people usually want to make sure there is no under replicated partitions. So shutting down a broker during a rolling bounce will have to wait for the previous restarted broker to catch up. This is T1. > T2: The time to send controlled shutdown request and receive controlled shutdown response. Currently the a controlled shutdown request will trigger many LeaderAndIsrRequest and UpdateMetadataRequest. And also involving many zookeeper update in serial. > T3: The actual time to shutdown all the components. It is usually small compared with T1 and T2. > T1 is related to: > A) the inbound throughput on the cluster, and > B) the "down" time of the broker (time between replica fetchers stop and replica fetchers restart) > The larger the traffic is, or the longer the broker stopped fetching, the longer it will take for the broker to catch up and get back into ISR. Therefore the longer T1 will be. Assume: > * the in bound network traffic is X bytes/second on a broker > * the time T1.B ("down" time) mentioned above is T > Theoretically it will take (X * T) / (NetworkBandwidth - X) = InBoundNetworkUtilization * T / (1 - InboundNetworkUtilization) for a the broker to catch up after the restart. While X is out of our control, T is largely related to T2. > The purpose of this ticket is to reduce T2 by: > 1. Batching the LeaderAndIsrRequest and UpdateMetadataRequest during controlled shutdown. > 2. Use async zookeeper write to pipeline zookeeper writes. According to Zookeeper wiki(https://wiki.apache.org/hadoop/ZooKeeper/Performance), a 3 node ZK cluster should be able to handle 20K writes (1K size). So if we use async write, likely we will be able to reduce zookeeper update time to lower seconds or even sub-second level. -- This message was sent by Atlassian JIRA (v6.3.15#6346)