From dev-return-99489-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Tue Oct 30 02:16:16 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2CB56180627 for ; Tue, 30 Oct 2018 02:16:16 +0100 (CET) Received: (qmail 89446 invoked by uid 500); 30 Oct 2018 01:16:10 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 89431 invoked by uid 99); 30 Oct 2018 01:16:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Oct 2018 01:16:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id E082EC08D5 for ; Tue, 30 Oct 2018 01:16:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.889 X-Spam-Level: * X-Spam-Status: No, score=1.889 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id VB5_z6hI5vxV for ; Tue, 30 Oct 2018 01:16:06 +0000 (UTC) Received: from mail-pf1-f181.google.com (mail-pf1-f181.google.com [209.85.210.181]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id A048F5F3D0 for ; Tue, 30 Oct 2018 01:16:06 +0000 (UTC) Received: by mail-pf1-f181.google.com with SMTP id j23-v6so4928797pfi.4 for ; Mon, 29 Oct 2018 18:16:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=qCfIQgL1bEriL+1OPQdGfGDi78kt840UtWjsE67uHyg=; b=JZF9puL+3Q1INCEnVMRBSYZR7Hk1Q2ZZHfWPnahMGhBzJhNtVwezzn31Cp49FzOG/9 9aWkT5SoOY0dXGJvDMp60QjSrH8lWFAPa6M98Ltk9+zKnxF2kspXWmR4zHs/r2QF3MQ9 E4vY/iX49yX90jsVemJPiWbaxlGWPRdcPueJEMiBkrYY/uJLY7RO6Hw2/ai55T/EATrc 71DP7Yt4KA6FqYQfaTwIUyvsW5S5WDAPiG/sN6BxgmgLaF4QcwqG4eWAp8R6d6+ANTTa srHquWd+MeDZ3uB62g4s73NPu+b+JqbKQi1UV47GOq8749pHjTo0w5c7iKTfVqMI+Wf/ 7mEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=qCfIQgL1bEriL+1OPQdGfGDi78kt840UtWjsE67uHyg=; b=BMP4A+8T7EOIVIZw7tY9iNADrgKNNTD1k88wX1xZD5JPVYYzLyBN7VyBCTTltGCe89 0kB7oIbgHzxTvypLPx7d/mdJaD333qHREyUzEbfpWI+jit/7OQ6PBImeZ1bl5fSAatPw wfCokVEyoQX7bAQV3YcK6ZJu1t+P+dZHWiRQhziHA/+f5X4cj/B3QiCtUQLBXHeLwJyt MdwB6Mf9MqbYEfZ1aQVYoGX0hkU5Ctsmq/ORGZNsQNoAFxjGZZFKL6kGQxjhGWRJAKtG /fk+/to4ylx9lEwIqEXfXUaKVBY+wYK1KBOX5l1Q8MapxB+/RTMv9YHji2IUS6rWpYS/ 7O2w== X-Gm-Message-State: AGRZ1gJ85dHidpRCtuzXkkqSQIeistE9eCnYg+2idw2TFQVJQFw5bK15 gMwyguHrwC8Fqifm9HAq4qDVY1yRcMb5oJ0z5aJWDw== X-Google-Smtp-Source: AJdET5c7ormmDUKY903QFfxTV4eHd7QZlACKzx7LnzgU6UD87zA7eI6QzRAbWxiG/QxyiDyzntVR647FtPe3xxxLHSI= X-Received: by 2002:a63:f444:: with SMTP id p4mr15958452pgk.124.1540862159855; Mon, 29 Oct 2018 18:15:59 -0700 (PDT) MIME-Version: 1.0 References: <1537488165.3937608.1515434424.24238573@webmail.messagingengine.com> In-Reply-To: From: xiongqi wu Date: Mon, 29 Oct 2018 18:15:48 -0700 Message-ID: Subject: Re: [DISCUSS] KIP-370: Remove Orphan Partitions To: dev@kafka.apache.org Content-Type: multipart/alternative; boundary="000000000000afa50e057967ed9e" --000000000000afa50e057967ed9e Content-Type: text/plain; charset="UTF-8" Thanks Dong. I have updated the KIP. Instead of using a configure to specify the timeout, I switch it to use internal timer. User doesn't need a new configuration to use this feature. Xiongqi (Wesley) Wu On Mon, Oct 29, 2018 at 4:40 PM xiongqi wu wrote: > Dong, > > Thanks for the comments. > > 1) With KIP-380, in theory we don't need the timeout phase. > However, once orphan partitions are removed, they cannot be recovered. > The question is should we rely on the fact that the first leaderandISR > always contains correct information. > > For retention enabled topic, the deletion phase (step 3 in this KIP) will > protect against deletion of new segments. > For log compaction topic, since log segments can be relative old, delete > phase might delete useful segments if by any chance first leaderandISR is > incorrect. > > Here is the different with/without timeout phase: > Solution 1: without timeout phase, we rely on the first leaderandISR and > understand that if first leaderandISR is incorrect, we might loss data. We > don't protect against bug. > Solution 2: with timeout phase, we rely on the fact that, during > timeout period, there is at least one valid leaderandISR for any given > partition hosted by the broker. > With the complexity of adding a timeout configuration. > > The solution 2 is a more safer option that comes with the cost of timeout > configuration. > *What is your opinion on these two solutions?* > > > For your second comment: > > I will change the metric description. Thanks for pointing out the right > metric format. > > > Xiongqi (Wesley) Wu > > > On Sun, Oct 28, 2018 at 9:39 PM Dong Lin wrote: > >> Hey Xiongqi, >> >> Thanks for the KIP. Here are some comments: >> >> 1) KIP provides two motivation for the timeout/correction phase. One >> motivation is to handle outdated requests. Would this still be an issue >> after KIP-380? The second motivation seems to be mainly for performance >> optimization when there is reassignment. In general we expect data >> movement >> when we reassign partitions to new brokers. So this is probably not a >> strong reason for adding a new config. >> >> 2) The KIP says "Adding metrics to keep track of the number of orphan >> partitions and the size of these orphan partitions". Can you add the >> specification of these new metrics? Here are example doc in >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-237%3A+More+Controller+Health+Metrics >> . >> >> Thanks, >> Dong >> >> On Thu, Sep 20, 2018 at 5:40 PM xiongqi wu wrote: >> >> > Colin, >> > >> > Thanks for the comment. >> > 1) >> > auto.orphan.partition.removal.delay.ms refers to timeout since the >> first >> > leader and ISR request was received. The idea is we want to wait enough >> > time to receive up-to-dated leaderandISR request and any old or new >> > partitions reassignment requests. >> > >> > 2) >> > Is there any logic to remove the partition folders on disk? I can only >> > find references to removing older log segments, but not the folder, in >> the >> > KIP. >> > ==> yes, the plan is to remove partition folders as well. >> > >> > I will update the KIP to make it more clear. >> > >> > >> > Xiongqi (Wesley) Wu >> > >> > >> > On Thu, Sep 20, 2018 at 5:02 PM Colin McCabe >> wrote: >> > >> > > Hi Xiongqi, >> > > >> > > Thanks for the KIP. >> > > >> > > Can you be a bit more clear what the timeout >> > > auto.orphan.partition.removal.delay.ms refers to? Is the timeout >> > > measured since the partition was supposed to be on the broker? Or is >> the >> > > timeout measured since the broker started up? >> > > >> > > Is there any logic to remove the partition folders on disk? I can >> only >> > > find references to removing older log segments, but not the folder, in >> > the >> > > KIP. >> > > >> > > best, >> > > Colin >> > > >> > > On Wed, Sep 19, 2018, at 10:53, xiongqi wu wrote: >> > > > Any comments? >> > > > >> > > > Xiongqi (Wesley) Wu >> > > > >> > > > >> > > > On Mon, Sep 10, 2018 at 3:04 PM xiongqi wu >> > wrote: >> > > > >> > > > > Here is the implementation for the KIP 370. >> > > > > >> > > > > >> > > > > >> > > >> > >> https://github.com/xiowu0/kafka/commit/f1bd3085639f41a7af02567550a8e3018cfac3e9 >> > > > > >> > > > > >> > > > > The purpose is to do one time cleanup (after a configured delay) >> of >> > > orphan >> > > > > partitions when a broker starts up. >> > > > > >> > > > > >> > > > > Xiongqi (Wesley) Wu >> > > > > >> > > > > >> > > > > On Wed, Sep 5, 2018 at 10:51 AM xiongqi wu >> > > wrote: >> > > > > >> > > > >> >> > > > >> This KIP enables broker to remove orphan partitions >> automatically. >> > > > >> >> > > > >> >> > > > >> >> > > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-370%3A+Remove+Orphan+Partitions >> > > > >> >> > > > >> >> > > > >> Xiongqi (Wesley) Wu >> > > > >> >> > > > > >> > > >> > >> > --000000000000afa50e057967ed9e--