From dev-return-105807-archive-asf-public=cust-asf.ponee.io@kafka.apache.org Wed Jul 17 15:18:46 2019 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id 3868018060E for ; Wed, 17 Jul 2019 17:18:46 +0200 (CEST) Received: (qmail 77055 invoked by uid 500); 17 Jul 2019 15:18:43 -0000 Mailing-List: contact dev-help@kafka.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@kafka.apache.org Delivered-To: mailing list dev@kafka.apache.org Received: (qmail 77042 invoked by uid 99); 17 Jul 2019 15:18:43 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Jul 2019 15:18:43 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 0BDFDC2D69 for ; Wed, 17 Jul 2019 15:18:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.199 X-Spam-Level: X-Spam-Status: No, score=-0.199 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=confluent.io Received: from mx1-ec2-va.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id zCHwf3--cQPG for ; Wed, 17 Jul 2019 15:18:41 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=209.85.221.45; helo=mail-wr1-f45.google.com; envelope-from=john@confluent.io; receiver= Received: from mail-wr1-f45.google.com (mail-wr1-f45.google.com [209.85.221.45]) by mx1-ec2-va.apache.org (ASF Mail Server at mx1-ec2-va.apache.org) with ESMTPS id B7846BC7AD for ; Wed, 17 Jul 2019 15:18:40 +0000 (UTC) Received: by mail-wr1-f45.google.com with SMTP id x4so25239944wrt.6 for ; Wed, 17 Jul 2019 08:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=confluent.io; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=3gGJ0XTIDP/kQRZ7Lwpi2jFo0ZrZE52lBh2ThOA3MiE=; b=ipjYe7Q+4q6ZvQ4JjGIbUyJ3rSp3ZMRhqwz+vRFI+XnoA+BVsQ4UlXt7ByiG7NEwVY Z6GFmjI53in8XD3CH9BlR0qlzeYP1oi9W72WkW2HJ9zN+mrEVQTpdDpzI+GfT6d+Ly5Q PDAuZW7tiJ2wgVRFZUWMTyHiaqFelgAmd3X1X+PqOu55EZx66f61tly8M75pKBHSaWz0 CjtJntZWGnM9QNCsV3Vut8WQPiIE+0GMiGJoZVRsXivVCyA6LlSFN4NlQjk7HSb6pKLW 5EHCI7Tw3GUUk1ynoaiqMexfkTlK+GRd/Ac33gtyWxQL/t5alLXty2wcCYSl7kiW5TGq MZJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=3gGJ0XTIDP/kQRZ7Lwpi2jFo0ZrZE52lBh2ThOA3MiE=; b=KPsqXyR2OD5GivYI91U9qKTcJxqZDikSgZxHdtQR/Os5I0KS7JDy9/nZvpq33KtmlU N2NIIuRYgjA6Fe+O8vcrPnKezpBsNQSurbB5244Ck/hVsfq0WGfswS24NwBOIYp0Hc5y p5F4WrFLpLYxErsVqDNIWLYTawyzMIWYYz+RHMewR6JLzLb03gBuRDYSMJf1ucqZDi5Y ZxQAZs4E5y3iVhlyCYmRRRR8MDiUnUbKCqrL7f9vO7KciDOe7tprow2lqzSDRzUB+Xzw yi+AmCnoWH/9sQ+4m2fdI/p5dTiP9vMuYDHP7rjRPrCXlRdq39QTHKuzRGW11UHilOrf alpg== X-Gm-Message-State: APjAAAWUTOyBXGnpF0PICL3n7hdLb0elZwhSehDtoa4/7pB7FXPp93Oc b92jTa3pSPr65h0xDBnkdZ8zlhcjmc8rlxCNVsvCi9iuKUs= X-Google-Smtp-Source: APXvYqyzy4N6bREkfgRhuWkBMQy11tcFWeggkOnEQwXwbLdwtnE4iDgjNjmW3JEv103mx7FdUlzQV3U58m12CiGRj9w= X-Received: by 2002:adf:c803:: with SMTP id d3mr20499197wrh.130.1563376713287; Wed, 17 Jul 2019 08:18:33 -0700 (PDT) MIME-Version: 1.0 References: <6774e884-7165-0d20-b537-43ddf437657f@confluent.io> In-Reply-To: <6774e884-7165-0d20-b537-43ddf437657f@confluent.io> From: John Roesler Date: Wed, 17 Jul 2019 10:18:21 -0500 Message-ID: Subject: Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics To: dev@kafka.apache.org Content-Type: text/plain; charset="UTF-8" Thanks for the discussion, all. I've done a little more research into the statistical terminology. Matthias is correct, "running" and "moving" appear to be synonyms. Unfortunately, both can be computed either over a window of the last N measurements or over all prior measurements. "Moving" just signifies that the statistic is computed over a "live" data set, i.e., a continuous stream of measurements, and the expectation is that the stat would be updated in response to new measurements. I found https://en.wikipedia.org/wiki/Moving_average to have a pretty good overview of the whole picture. After considering the discussion so far and some light reading, it seems like "Cumulative" is truly the correct term for the all-time metrics: > In a cumulative moving average, the data arrive in an > ordered datum stream, and the user would like to get > the average of all of the data up until the current datum > point. I know that we previously felt that "cumulative" was too much of a mouthful, but it seems like our quest for a terser term led us into a briar patch. Also, now there is an independent source (the wiki page) indicating that this is indeed the correct term, and it doesn't offer any synonyms to choose from. Maybe we can take comfort in the fact that we'll rarely be saying the name of the classes out loud. As far as moving stats that operate over a window of the last N measurements, there are multiple options, including Simple (unweighted), Weighted, and Exponentially Weighted, and presumably infinite variations with other weighting functions. In our domain, there is only one weighting function available, but it's still more self-documenting and future-proof to specify the type of windowed statistic. Therefore, I'm proposing "Simple" as the term for the windowed (aka sampled) stats, while keeping Windowed in the name to distinguish it from the all-time metrics. > In financial applications a simple moving average (SMA) > is the unweighted mean of the previous n data. Therefore, we would have the proposed matrix: SimpleWindowedSum, SimpleWindowedCount CumulativeSum, CumulativeCount Again, all these proposed names are less pithy than we might wish, but the whole point of this exercise is to demystify and disambiguate them. It seems like the discussion so far illustrates the futility of trying to find names that are both short and descriptive. How does that sound? -John On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax wrote: > > It's a fair point that Ryanne raises. However, "running sum" is the same > as "moving sum" from my understanding. > > The issue is still, that `Sum` and `Count` which seem to be the cleanest > names cannot be used. While I agree that `TotalSum` and `TotalCount` is > somewhat redundant, I still think it the best suggestion so far. > > For the "sampled" version, I am personally fine with either `MovingXxx`, > `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to > describe the semantics. > > > > -Matthias > > > > On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote: > > I'm +1 on Windowed, was about to suggest that as I was catching up on the > > discussion but Bill beat me to it :) > > > > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck wrote: > > > >> Hi John, > >> > >> Thanks for the updates. > >> > >> I like RunningCount and RunningSum. > >> > >> What about WindowedCount, WindowedSum instead of Moving? > >> I'm just throwing this out there as Windowed seems more intuitive to me, > >> but I'm not married to the idea. > >> > >> -Bill > >> > >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler wrote: > >> > >>> No worries! Choosing good public API names is a high-impact design > >>> activity. > >>> > >>> Matthias, Bruno, Bill, and Stanislav, > >>> > >>> You've all contributed to this discussion or the vote so far... How do > >>> you feel about the proposed name change: > >>> > >>> MovingCount, MovingSum (instead of Sampled) > >>> RunningCount, RunningSum (Instead of Total) > >>> > >>> Thanks, > >>> -John > >>> > >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan > >>> wrote: > >>>> > >>>> John, that makes sense to me. Sorry for the bikeshedding. > >>>> > >>>> Ryanne > >>>> > >>>> On Tue, Jul 16, 2019 at 12:49 PM John Roesler > >> wrote: > >>>> > >>>>> Thanks for the explanation and the suggestion, Ryanne, > >>>>> > >>>>> I went with "sampled" just because these are instances of > >> SampledStat, > >>>>> which in the Kafka Metrics ecosystem are computed from a window of > >>>>> recent samples. Thinking more about it, the fact that they are > >> sampled > >>>>> and the fact that they are windowed are orthogonal, which is what > >>>>> you're pointing out... sampling by itself doesn't indicate that it's > >> a > >>>>> moving average. > >>>>> > >>>>> Since there is no way in Kafka Metrics for a metric to be sampled and > >>>>> not windowed/moving/decaying, calling them Sampled would never be > >>>>> incorrect. But to someone unfamiliar with the code, it wouldn't > >>>>> immediately suggest the behavior of the metric that actually matters. > >>>>> That is, the behavior that distinguishes the two classes of metrics > >> we > >>>>> want to disambiguate here. > >>>>> > >>>>> It sounds like you'd suggest a new matrix of names: > >>>>> MovingCount, MovingSum > >>>>> RunningCount, RunningSum > >>>>> > >>>>> Are these names unambiguous and self explanatory? > >>>>> > >>>>> Thanks, > >>>>> -John > >>>>> > >>>>> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan >>> > >>>>> wrote: > >>>>>> > >>>>>>> measurements, which decay/expire over time > >>>>>> > >>>>>> Thanks John for the clarification. This was my (re-)reading of the > >>> code, > >>>>>> but this is not what I think of when I hear "sampled". In fact, > >>> you'll > >>>>>> notice that the Wikipedia pages for "Sample (statistics)" and > >> "Sample > >>>>>> (signal processing)" do not contain the words decay, expire, > >> recent, > >>>>>> history, or anything similar. > >>>>>> > >>>>>> Similar to "running", I'd suggest the more correct "moving", as in > >>>>> "moving > >>>>>> average" and "moving sum", which involve looking back N samples, > >>>>> applying a > >>>>>> sliding window, decaying over time etc. > >>>>>> > >>>>>> Ryanne > >>>>>> > >>>>>> On Tue, Jul 16, 2019, 11:58 AM John Roesler > >>> wrote: > >>>>>> > >>>>>>> Thanks for raising this concern, Ryanne, > >>>>>>> > >>>>>>> "Sampled" indicates that the metrics is sampled, namely that we > >>>>>>> maintain a set of samples from recent value measurements, which > >>>>>>> decay/expire over time. So, the metric value is only > >>> representative of > >>>>>>> the recent past. > >>>>>>> > >>>>>>> "Total" indicates that the metric value contains all the > >>> information > >>>>>>> from the creation of the metric. For example., the total sum > >> would > >>>>>>> include all measurements since the app started up. > >>>>>>> > >>>>>>> It seems like your concern is that the word "total" doesn't > >> really > >>>>>>> pinpoint this meaning, which is true. It's especially confusing > >>> that > >>>>>>> another meaning of "total" is synonymous with "sum", rendering > >> the > >>>>>>> name "TotalSum" sort of absurd. > >>>>>>> > >>>>>>> We previously considered "cumulative", which was rejected as a > >>>>>>> mouthful (it's four syllables) . > >>>>>>> > >>>>>>> You mentioned "running", which might be a more appropriate > >> modifier > >>>>>>> (RunningSum and RunningCount). > >>>>>>> > >>>>>>> What would everyone think about that? > >>>>>>> > >>>>>>> Thanks, > >>>>>>> -John > >>>>>>> > >>>>>>> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan < > >>> ryannedolan@gmail.com> > >>>>>>> wrote: > >>>>>>>> > >>>>>>>> John, I mentioned on the VOTE thread that the proposed names > >> are > >>> a > >>>>> bit > >>>>>>>> confusing, > >>>>>>>> > >>>>>>>>> given that "sum", "total", and "count" are roughly > >>> synonymous... > >>>>>>>> > >>>>>>>> In particular, TotalSum is, I think, a "running total", though > >>> the > >>>>> naming > >>>>>>>> doesn't really capture that. > >>>>>>>> > >>>>>>>> I think to avoid confusion, we should define exactly what > >>> "total" and > >>>>>>>> "sampled" are supposed to indicate, and perhaps pick > >> appropriate > >>>>> naming > >>>>>>>> from there. > >>>>>>>> > >>>>>>>> Ryanne > >>>>>>>> > >>>>>>>> > >>>>>>>> On Fri, Jul 12, 2019 at 1:42 PM John Roesler < > >> john@confluent.io> > >>>>> wrote: > >>>>>>>> > >>>>>>>>> Hey, thanks Matthias and Bruno, > >>>>>>>>> > >>>>>>>>> I agree, "Cumulative" is a mouthful. "TotalX" sounds fine to > >>> me. > >>>>>>>>> > >>>>>>>>> Also, yes, I would have liked to not have any modifier for > >>>>>>>>> "non-sampled", but there is a name conflict with Sum. > >>>>>>>>> > >>>>>>>>> I'll update the KIP to reflect "TotalX" and then start the > >> vote > >>>>> thread. > >>>>>>>>> > >>>>>>>>> Thanks again, > >>>>>>>>> -John > >>>>>>>>> > >>>>>>>>> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna < > >>> bruno@confluent.io > >>>>>> > >>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> OK, makes sense. Then, I am in favour of TotalCount and > >>> TotalSum. > >>>>>>>>>> > >>>>>>>>>> Best, > >>>>>>>>>> Bruno > >>>>>>>>>> > >>>>>>>>>> On Fri, Jul 12, 2019 at 12:57 AM Matthias J. Sax < > >>>>>>> matthias@confluent.io> > >>>>>>>>> wrote: > >>>>>>>>>>> > >>>>>>>>>>> `Sum` is an existing name, for the "sampled sum" metric, > >>> that > >>>>> gets > >>>>>>>>>>> deprecated. Hence, we cannot use it. > >>>>>>>>>>> > >>>>>>>>>>> If we cannot use `Sum` and use `TotalSum`, we should also > >>> not > >>>>> use > >>>>>>>>>>> `Count` but `TotalCount` for consistency. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> -Matthias > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> On 7/11/19 12:58 PM, Bruno Cadonna wrote: > >>>>>>>>>>>> Hi John, > >>>>>>>>>>>> > >>>>>>>>>>>> Thank you for the KIP. > >>>>>>>>>>>> > >>>>>>>>>>>> LGTM > >>>>>>>>>>>> > >>>>>>>>>>>> I also do not like CumulativeSum/Count so much. I > >>> propose to > >>>>> just > >>>>>>>>> call > >>>>>>>>>>>> it Sum and Count. > >>>>>>>>>>>> > >>>>>>>>>>>> I understand that you want to unequivocally distinguish > >>> the > >>>>> two > >>>>>>>>> metric > >>>>>>>>>>>> functions by their names, but I have the feeling the > >>> names > >>>>> become > >>>>>>>>>>>> artificially complex. The exact semantics can also be > >>>>> documented > >>>>>>> in > >>>>>>>>>>>> the javadocs, which btw could also be improved in those > >>>>> classes. > >>>>>>>>>>>> > >>>>>>>>>>>> Best, > >>>>>>>>>>>> Bruno > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> On Thu, Jul 11, 2019 at 8:25 PM Matthias J. Sax < > >>>>>>>>> matthias@confluent.io> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks for the KIP. Overall LGTM. > >>>>>>>>>>>>> > >>>>>>>>>>>>> The only though I have is, if we may want to use > >>> `TotalSum` > >>>>> and > >>>>>>>>>>>>> `TotalCount` instead of `CumulativeSum/Count` as > >> names? > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> -Matthias > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>>> On 7/11/19 9:31 AM, John Roesler wrote: > >>>>>>>>>>>>>> Hi Kafka devs, > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'd like to propose KIP-488 as a minor cleanup of > >> some > >>> of > >>>>> our > >>>>>>>>> metric > >>>>>>>>>>>>>> implementations. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> KIP-488: > >> https://cwiki.apache.org/confluence/x/kkAyBw > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Over time, iterative updates to these metrics has > >>> resulted > >>>>> in a > >>>>>>>>> pretty > >>>>>>>>>>>>>> confusing little collection of classes, and I've > >>> personally > >>>>>>> been > >>>>>>>>>>>>>> involved in three separate moderately time-consuming > >>>>>>> iterations of > >>>>>>>>> me > >>>>>>>>>>>>>> or someone else trying to work out which metrics are > >>>>>>> available, and > >>>>>>>>>>>>>> which ones are desired for a given use case. One of > >>> these > >>>>> was > >>>>>>>>> actually > >>>>>>>>>>>>>> a long-running bug in Kafka Streams' metrics, so not > >>> only > >>>>> has > >>>>>>> this > >>>>>>>>>>>>>> confusion been a time sink, but it has also led to > >>> bugs. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> I'm hoping this change won't be too controversial. > >>>>>>>>>>>>>> > >>>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>>> -John > >>>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>> > >>>>>>> > >>>>>>> > >>>>> > >>> > >> > > >