From dev-return-104928-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Jun 12 20:29:28 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8FD5618061A for ; Wed, 12 Jun 2019 22:29:28 +0200 (CEST) Received: (qmail 40020 invoked by uid 500); 12 Jun 2019 20:29:25 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 39971 invoked by uid 99); 12 Jun 2019 20:29:20 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Jun 2019 20:29:20 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 2BD7FC228E for ; Wed, 12 Jun 2019 20:29:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.8 X-Spam-Level: ** X-Spam-Status: No, score=2.8 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 25t1Vd1FzN3q for ; Wed, 12 Jun 2019 20:29:17 +0000 (UTC) Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E7D8B5FCDB for ; Wed, 12 Jun 2019 20:29:16 +0000 (UTC) Received: by mail-ot1-f41.google.com with SMTP id r6so12459268oti.3 for ; Wed, 12 Jun 2019 13:29:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=Mmeu553EmcM1GCJZfybEdjhfgrTkZYqlQQyoXF7N9KI=; b=JO1QJ9arlv6O0j1xRiC9nJaprUmWIVZtms9OrxqGrs4U8AqbnPv+PKAQdbjL3uJ9j2 hsWTZ7X4G18zpl+zjbSme6Zrid7uB6rO2dtimg8WwFGde3PQ0pQmMknT5DT+7u5vPrMH L8kFk6RG++VrvSid6v3OoTCDOF0EW+f2P9Na3tSphzh62tFuERbu4wQNqjnSDqTkaQ5B FvbpvQaIvsbGUEqhnnWFlqT+iDcIeAz0ZHP8Utrm5rFVvvg8X005twiiCu413nrS/G6A cBBoPX5BrA1qtCz9xJU507sJk3LK0pKyslX1gSE5mG1UH5wJud0eSpTeu2TAxIYHyZo5 7wSg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=Mmeu553EmcM1GCJZfybEdjhfgrTkZYqlQQyoXF7N9KI=; b=pZb7If9lT0eMCxapqtEk+dpvm3pWtdMts/wjW+mWSwXv7RWIMKge9bGHgM/p+iDY9l JlpzqXQVKo9Hb7A2AotJyuxYAtzUHw+mNeEhRWhaw6LGqjX2fv3C38D1mmwlx+XjM+Qx g8X8K/OdO92O/ovQKYnWrWvTwQ6/sQG4KdZ8qh0+w3EatRgL+otuxZ2Qjey96USjS5G5 XPewUzt8ws5iJetR4dInU+yBEjslOtdQC6nUNM3/fvZ6I7juWnEL/ulxQddZpxbynnKN J99jg45RdDd/IjPKwqNhYcLzB6DKwUjjv70hshIuFpq6tS8oYaRHtv4FTi+ArGabcUti EzKg== X-Gm-Message-State: APjAAAVgpPmb6cE6s6JuTg+V7ijbE/esWEHDXYgbEVMam3kbzKFymO0E xuiRSxCFhrbcEkoBicXXMzKwz7w3jOqjILfbXAYIqOD//es= X-Google-Smtp-Source: APXvYqx7hwPUhacS6S9FmsGQcFQZ9Wh/kjrVHvAzKP663I9YxCH3uoxFYCCmwbNT8sMRblWbBCImIkjiaA/ev62auuk= X-Received: by 2002:a9d:7248:: with SMTP id a8mr9902148otk.363.1560371349935; Wed, 12 Jun 2019 13:29:09 -0700 (PDT) MIME-Version: 1.0 References: <82670b3d-ed8e-14f4-2ed2-04b0852c70c8@confluent.io> In-Reply-To: From: Guozhang Wang Date: Wed, 12 Jun 2019 13:28:58 -0700 Message-ID: Subject: Re: KIP-457: Add DISCONNECTED state to Kafka Streams To: dev Content-Type: multipart/alternative; boundary="00000000000007da6b058b264418" --00000000000007da6b058b264418 Content-Type: text/plain; charset="UTF-8" Hi Richard, Sorry for getting late on this, I've finally get some time to take a look at https://github.com/apache/kafka/pull/6594 as well as the KIP itself. Here are some thoughts: 1. The main motivation of this KIP is to be able to distinguish the case where a. "Streams client is in an unhealthy situation and hence cannot proceed" (which we have an ERROR state) and b. "Streams client is perfectly healthy, but it cannot get to the target brokers and hence cannot proceed", and this should also be distinguishable from c. "both Streams and brokers are healthy, there's just no data available for processing and hence cannot proceed"). And we want to have a way to notify the users about the second case b) distinguished from the others . 2. Following this, when I first thought about the solution I was thinking about adding a new state in the FSM of Kafka Streams, but after reviewing the code and the KIP, I felt this may be an overkill to complicate the FSM. Now I'm wondering if we can achieve the same thing with a single metric. For example: 2.a) we know that in Streams we always rely on consumer membership to allocate partitions to instances, which means that the heartbeat thread has to be working if the consumer wants to ever receive some data, what we can do is to let users monitor on this metric directly, e.g. if the heartbeat-rate drops to zero BUT the state is still in RUNNING it means we are in case b) above. 2.b) if we want to provide a streams-level metric out-of-the-box rather than letting users to monitor on consumer metrics, another idea is to leverage on existing "public Set assignment()" of KafkaConsumer, and record the time when it returns empty, meaning that nothing was assigned. And expose this as a boolean metric indicating nothing was assigned and hence we are likely in case b) above --- note this could also mean that we have fewer partitions than necessary so that some instance does not have any assignment indeed, which is not the same as b), but I feel consolidating these to cases with a single metric seem also fine. Guozhang On Wed, Apr 17, 2019 at 2:30 PM Richard Yu wrote: > Alright, so I made a few changes to the KIP. > I realized that there might be an easier way to give the user information > on the connection state of Kafka Streams. > In implementation, if one wishes to have DISCONNECTED as a state, then one > would have to factor in proper state transitions. > The other approach that is now outlined in the KIP. Instead, we could just > add a method which I think achieves the same effect. > If any of you thinks there is wrong with this approach, please let me know. > :) > > Cheers, > Richard > > On Wed, Apr 17, 2019 at 11:49 AM Richard Yu > wrote: > > > I just realized something. > > > > Hi Matthias, might need your input here. > > I realized that when implementing this change, as noted in the JIRA, we > > would need to "check the behaviour of the consumer" since its consumer's > > connection with broker that we are dealing with. > > > > So doesn't that mean we would also be dealing with consumer API changes > as > > well? > > I don't think consumer has any methods which would give us the state of a > > connection either. > > > > - Richard > > > > On Wed, Apr 17, 2019 at 8:43 AM Richard Yu > > wrote: > > > >> Hi Micheal, > >> > >> Yeah, those are some points I should've clarified. > >> No problem. Have got it done. > >> > >> > >> > >> On Wed, Apr 17, 2019 at 6:42 AM Michael Noll > >> wrote: > >> > >>> Richard, > >>> > >>> thanks for looking into this! > >>> > >>> However, I have some concerns. The KIP you created ( > >>> > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams > >>> ) > >>> doesn't yet address open questions such as the ones mentioned by > >>> Matthias: > >>> > >>> 1) What is the difference between DEAD and the proposed DISCONNECTED? > >>> This > >>> should be defined in the KIP. > >>> > >>> 2) Difference between your KIP and the JIRA ( > >>> https://issues.apache.org/jira/browse/KAFKA-6520): In the JIRA ticket, > >>> the > >>> DISCONNECTED state was proposed for the scenario that the KStreams > >>> application is healthy but the Kafka broker is down. This is different > to > >>> what you wrote in the KIP: "When something happens in Kafka Streams, > such > >>> as an unexpected crash or error, KafkaStreams#state() will return > >>> State.DISCONNECTED.", which seems to mean that DISCONNECTED should be > the > >>> state when the KStreams app is down. > >>> > >>> I wouldn't expect a KIP vote to pass if these basic questions aren't > >>> properly sorted out in the KIP. > >>> > >>> Best, > >>> Michael > >>> > >>> > >>> > >>> On Wed, Apr 17, 2019 at 3:35 AM Richard Yu > > >>> wrote: > >>> > >>> > Hi all, > >>> > > >>> > Considering that this is a simple KIP, I would probably start the > >>> voting > >>> > tomorrow. > >>> > I think it would be good if we could get this in fast. > >>> > > >>> > On Tue, Apr 16, 2019 at 3:31 PM Richard Yu < > yohan.richard.yu@gmail.com > >>> > > >>> > wrote: > >>> > > >>> > > Oh, I probably misunderstood the difference between DISCONNECTED > and > >>> > DEAD. > >>> > > I will update the KIP accordingly. > >>> > > Thanks for pointing that out! > >>> > > > >>> > > > >>> > > On Tue, Apr 16, 2019 at 3:13 PM Matthias J. Sax < > >>> matthias@confluent.io> > >>> > > wrote: > >>> > > > >>> > >> Thanks for the initiative. > >>> > >> > >>> > >> In the motivation you mention that you want to use DISCONNECT to > >>> > >> indicate that the application was killed. > >>> > >> > >>> > >> What is the difference to existing state DEAD? > >>> > >> > >>> > >> Also, the backing JIRA seems to have a different motivation to > add a > >>> > >> DISCONNECT state. There, the Kafka Streams application itself is > >>> > >> healthy, but it cannot connect to the brokers. It seems reasonable > >>> to > >>> > >> add a DISCONNECT for this case though. > >>> > >> > >>> > >> > >>> > >> > >>> > >> -Matthias > >>> > >> > >>> > >> > >>> > >> > >>> > >> On 4/16/19 9:30 AM, Richard Yu wrote: > >>> > >> > Hi all, > >>> > >> > > >>> > >> > I like to propose a small KIP on adding a new state to > >>> > >> KafkaStreams#state(). > >>> > >> > It is very simple, so this should pass relatively quickly! > >>> > >> > Here is the discussion link: > >>> > >> > > >>> > >> > >>> > > >>> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-457%3A+Add+DISCONNECTED+status+to+Kafka+Streams > >>> > >> > > >>> > >> > Cheers, > >>> > >> > Richard > >>> > >> > > >>> > >> > >>> > >> > >>> > > >>> > >> > -- -- Guozhang --00000000000007da6b058b264418--