From dev-return-105788-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Tue Jul 16 21:25:37 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id A059918064E for ; Tue, 16 Jul 2019 23:25:37 +0200 (CEST) Received: (qmail 90092 invoked by uid 500); 16 Jul 2019 21:25:35 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 90079 invoked by uid 99); 16 Jul 2019 21:25:33 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Jul 2019 21:25:33 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 532EEC00C7 for ; Tue, 16 Jul 2019 21:25:33 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.801 X-Spam-Level: * X-Spam-Status: No, score=1.801 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent.io Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 4rW72hpQymNS for ; Tue, 16 Jul 2019 21:25:31 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.214.181; helo=mail-pl1-f181.google.com; envelope-from=sophie@confluent.io; receiver= Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id 9BD5DBC7AD for ; Tue, 16 Jul 2019 21:25:30 +0000 (UTC) Received: by mail-pl1-f181.google.com with SMTP id c14so10744703plo.0 for ; Tue, 16 Jul 2019 14:25:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=iFvfK+1oDVc91aL6KvIV2lfmlKial+sY9TXzC6uziPQ=; b=ZA7RepEClVa+6Qt852fR9NIRkbU0XoQXAH7JsiHKdSEOf4XBavPG6ITH2BUxOsi5Ld e/xM2Yu8zemt1rMStf6xkSRZ0ALutfCig95PvNbf04qBcZg8cac6qMLvGCfWc5ajuuVh Wf7j4e0AvLtcPdkeTVtbCqX079QL6Jup1zU2RAAiL5nqUG3sAFTBhQEwkATfqwTPK5E6 Czjc7hB6W71LcaASa+874P5Njwx9zWSLmZrW1SdFk5rJu8aVXsBdHueqAhDG5GOAZdyT /cA+wtvYABYlt9TT/BMOd8ALtBJFw1b87FIbBhFq94EGUXNNdbWLSsk9W1JgfEh2kyIE oweQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=iFvfK+1oDVc91aL6KvIV2lfmlKial+sY9TXzC6uziPQ=; b=ia73Svqk0SWS/YlNLQLkBDxJ9iEW3D5MyeJWHw2Z/6cPAL4A4j0JJRAsB/tu1Gocie leRoV6YgXOSeqcb0aJDuLyGaiFi3T4FMas56M7Tqas2euDoH10eaTI84MLwoehX/pj7l ab9Fmnc3oHb8IMFaCUY8GZ2aYiySqB59SN596fz8HK2oZpWqhq//X0qx4pZ7KjqhdG2U U9+0/0k2hfCN/3welijJKVV8bDSbWSQiol0e5+Kc9XP5J5oNEZAuKV7V5hTSQ7JHHkMP +hzU5fFjBFA4euA1szaQzBtUL6Z25+RbdXcOTnHb68c6vVRklCfpR+NrwwlbzdhtrcQs xdpg== X-Gm-Message-State: APjAAAU7aDsWsvaAh13h/u8OpokuO7ngVMxDey9oJ+Ths4a9C/Cdl/8A rgC8wI6zrkMSiRXBjdTAe4F8X2k1ZvHuVNyGWq4m6izT X-Google-Smtp-Source: APXvYqy1Mybp75L387++Lgow/eyenjjTR2vHvegcRr3VSWvXuYjpb88iys+TpcwDD12CXJu620UXvG+KI5DbBskDMkM= X-Received: by 2002:a17:902:ab83:: with SMTP id f3mr38429407plr.122.1563312329521; Tue, 16 Jul 2019 14:25:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Sophie Blee-Goldman Date: Tue, 16 Jul 2019 14:25:18 -0700 Message-ID: Subject: Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics To: dev Content-Type: multipart/alternative; boundary="000000000000131d98058dd30420" --000000000000131d98058dd30420 Content-Type: text/plain; charset="UTF-8" I'm +1 on Windowed, was about to suggest that as I was catching up on the discussion but Bill beat me to it :) On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck wrote: > Hi John, > > Thanks for the updates. > > I like RunningCount and RunningSum. > > What about WindowedCount, WindowedSum instead of Moving? > I'm just throwing this out there as Windowed seems more intuitive to me, > but I'm not married to the idea. > > -Bill > > On Tue, Jul 16, 2019 at 5:09 PM John Roesler wrote: > > > No worries! Choosing good public API names is a high-impact design > > activity. > > > > Matthias, Bruno, Bill, and Stanislav, > > > > You've all contributed to this discussion or the vote so far... How do > > you feel about the proposed name change: > > > > MovingCount, MovingSum (instead of Sampled) > > RunningCount, RunningSum (Instead of Total) > > > > Thanks, > > -John > > > > On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan > > wrote: > > > > > > John, that makes sense to me. Sorry for the bikeshedding. > > > > > > Ryanne > > > > > > On Tue, Jul 16, 2019 at 12:49 PM John Roesler > wrote: > > > > > > > Thanks for the explanation and the suggestion, Ryanne, > > > > > > > > I went with "sampled" just because these are instances of > SampledStat, > > > > which in the Kafka Metrics ecosystem are computed from a window of > > > > recent samples. Thinking more about it, the fact that they are > sampled > > > > and the fact that they are windowed are orthogonal, which is what > > > > you're pointing out... sampling by itself doesn't indicate that it's > a > > > > moving average. > > > > > > > > Since there is no way in Kafka Metrics for a metric to be sampled and > > > > not windowed/moving/decaying, calling them Sampled would never be > > > > incorrect. But to someone unfamiliar with the code, it wouldn't > > > > immediately suggest the behavior of the metric that actually matters. > > > > That is, the behavior that distinguishes the two classes of metrics > we > > > > want to disambiguate here. > > > > > > > > It sounds like you'd suggest a new matrix of names: > > > > MovingCount, MovingSum > > > > RunningCount, RunningSum > > > > > > > > Are these names unambiguous and self explanatory? > > > > > > > > Thanks, > > > > -John > > > > > > > > On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan > > > > > wrote: > > > > > > > > > > > measurements, which decay/expire over time > > > > > > > > > > Thanks John for the clarification. This was my (re-)reading of the > > code, > > > > > but this is not what I think of when I hear "sampled". In fact, > > you'll > > > > > notice that the Wikipedia pages for "Sample (statistics)" and > "Sample > > > > > (signal processing)" do not contain the words decay, expire, > recent, > > > > > history, or anything similar. > > > > > > > > > > Similar to "running", I'd suggest the more correct "moving", as in > > > > "moving > > > > > average" and "moving sum", which involve looking back N samples, > > > > applying a > > > > > sliding window, decaying over time etc. > > > > > > > > > > Ryanne > > > > > > > > > > On Tue, Jul 16, 2019, 11:58 AM John Roesler > > wrote: > > > > > > > > > > > Thanks for raising this concern, Ryanne, > > > > > > > > > > > > "Sampled" indicates that the metrics is sampled, namely that we > > > > > > maintain a set of samples from recent value measurements, which > > > > > > decay/expire over time. So, the metric value is only > > representative of > > > > > > the recent past. > > > > > > > > > > > > "Total" indicates that the metric value contains all the > > information > > > > > > from the creation of the metric. For example., the total sum > would > > > > > > include all measurements since the app started up. > > > > > > > > > > > > It seems like your concern is that the word "total" doesn't > really > > > > > > pinpoint this meaning, which is true. It's especially confusing > > that > > > > > > another meaning of "total" is synonymous with "sum", rendering > the > > > > > > name "TotalSum" sort of absurd. > > > > > > > > > > > > We previously considered "cumulative", which was rejected as a > > > > > > mouthful (it's four syllables) . > > > > > > > > > > > > You mentioned "running", which might be a more appropriate > modifier > > > > > > (RunningSum and RunningCount). > > > > > > > > > > > > What would everyone think about that? > > > > > > > > > > > > Thanks, > > > > > > -John > > > > > > > > > > > > On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan < > > ryannedolan@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > > John, I mentioned on the VOTE thread that the proposed names > are > > a > > > > bit > > > > > > > confusing, > > > > > > > > > > > > > > > given that "sum", "total", and "count" are roughly > > synonymous... > > > > > > > > > > > > > > In particular, TotalSum is, I think, a "running total", though > > the > > > > naming > > > > > > > doesn't really capture that. > > > > > > > > > > > > > > I think to avoid confusion, we should define exactly what > > "total" and > > > > > > > "sampled" are supposed to indicate, and perhaps pick > appropriate > > > > naming > > > > > > > from there. > > > > > > > > > > > > > > Ryanne > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2019 at 1:42 PM John Roesler < > john@confluent.io> > > > > wrote: > > > > > > > > > > > > > > > Hey, thanks Matthias and Bruno, > > > > > > > > > > > > > > > > I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to > > me. > > > > > > > > > > > > > > > > Also, yes, I would have liked to not have any modifier for > > > > > > > > "non-sampled", but there is a name conflict with Sum. > > > > > > > > > > > > > > > > I'll update the KIP to reflect "TotalX" and then start the > vote > > > > thread. > > > > > > > > > > > > > > > > Thanks again, > > > > > > > > -John > > > > > > > > > > > > > > > > On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna < > > bruno@confluent.io > > > > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > OK, makes sense. Then, I am in favour of TotalCount and > > TotalSum. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Bruno > > > > > > > > > > > > > > > > > > On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax < > > > > > > matthias@confluent.io> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > `Sum` is an existing name, for the "sampled sum" metric, > > that > > > > gets > > > > > > > > > > deprecated. Hence, we cannot use it. > > > > > > > > > > > > > > > > > > > > If we cannot use `Sum` and use `TotalSum`, we should also > > not > > > > use > > > > > > > > > > `Count` but `TotalCount` for consistency. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On 7/11/19 12:58 PM, Bruno Cadonna wrote: > > > > > > > > > > > Hi John, > > > > > > > > > > > > > > > > > > > > > > Thank you for the KIP. > > > > > > > > > > > > > > > > > > > > > > LGTM > > > > > > > > > > > > > > > > > > > > > > I also do not like CumulativeSum/Count so much. I > > propose to > > > > just > > > > > > > > call > > > > > > > > > > > it Sum and Count. > > > > > > > > > > > > > > > > > > > > > > I understand that you want to unequivocally distinguish > > the > > > > two > > > > > > > > metric > > > > > > > > > > > functions by their names, but I have the feeling the > > names > > > > become > > > > > > > > > > > artificially complex. The exact semantics can also be > > > > documented > > > > > > in > > > > > > > > > > > the javadocs, which btw could also be improved in those > > > > classes. > > > > > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > > > Bruno > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax < > > > > > > > > matthias@confluent.io> wrote: > > > > > > > > > > >> > > > > > > > > > > >> Thanks for the KIP. Overall LGTM. > > > > > > > > > > >> > > > > > > > > > > >> The only though I have is, if we may want to use > > `TotalSum` > > > > and > > > > > > > > > > >> `TotalCount` instead of `CumulativeSum/Count` as > names? > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> -Matthias > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >> On 7/11/19 9:31 AM, John Roesler wrote: > > > > > > > > > > >>> Hi Kafka devs, > > > > > > > > > > >>> > > > > > > > > > > >>> I'd like to propose KIP-488 as a minor cleanup of > some > > of > > > > our > > > > > > > > metric > > > > > > > > > > >>> implementations. > > > > > > > > > > >>> > > > > > > > > > > >>> KIP-488: > https://cwiki.apache.org/confluence/x/kkAyBw > > > > > > > > > > >>> > > > > > > > > > > >>> Over time, iterative updates to these metrics has > > resulted > > > > in a > > > > > > > > pretty > > > > > > > > > > >>> confusing little collection of classes, and I've > > personally > > > > > > been > > > > > > > > > > >>> involved in three separate moderately time-consuming > > > > > > iterations of > > > > > > > > me > > > > > > > > > > >>> or someone else trying to work out which metrics are > > > > > > available, and > > > > > > > > > > >>> which ones are desired for a given use case. One of > > these > > > > was > > > > > > > > actually > > > > > > > > > > >>> a long-running bug in Kafka Streams' metrics, so not > > only > > > > has > > > > > > this > > > > > > > > > > >>> confusion been a time sink, but it has also led to > > bugs. > > > > > > > > > > >>> > > > > > > > > > > >>> I'm hoping this change won't be too controversial. > > > > > > > > > > >>> > > > > > > > > > > >>> Thanks, > > > > > > > > > > >>> -John > > > > > > > > > > >>> > > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --000000000000131d98058dd30420--