kafka-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Roesler <j...@confluent.io>
Subject Re: [DISCUSS] KIP-488: Clean up Sum,Count,Total Metrics
Date Wed, 17 Jul 2019 15:18:21 GMT
Thanks for the discussion, all.

I've done a little more research into the statistical terminology.
Matthias is correct, "running" and "moving" appear to be synonyms.
Unfortunately, both can be computed either over a window of the last N
measurements or over all prior measurements. "Moving" just signifies
that the statistic is computed over a "live" data set, i.e., a
continuous stream of measurements, and the expectation is that the
stat would be updated in response to new measurements.

I found https://en.wikipedia.org/wiki/Moving_average to have a pretty
good overview of the whole picture.

After considering the discussion so far and some light reading, it
seems like "Cumulative" is truly the correct term for the all-time
metrics:

> In a cumulative moving average, the data arrive in an
> ordered datum stream, and the user would like to get
> the average of all of the data up until the current datum
> point.

I know that we previously felt that "cumulative" was too much of a
mouthful, but it seems like our quest for a terser term led us into a
briar patch. Also, now there is an independent source (the wiki page)
indicating that this is indeed the correct term, and it doesn't offer
any synonyms to choose from. Maybe we can take comfort in the fact
that we'll rarely be saying the name of the classes out loud.

As far as moving stats that operate over a window of the last N
measurements, there are multiple options, including Simple
(unweighted), Weighted, and Exponentially Weighted, and presumably
infinite variations with other weighting functions. In our domain,
there is only one weighting function available, but it's still more
self-documenting and future-proof to specify the type of windowed
statistic. Therefore, I'm proposing "Simple" as the term for the
windowed (aka sampled) stats, while keeping Windowed in the name to
distinguish it from the all-time metrics.

> In financial applications a simple moving average (SMA)
> is the unweighted mean of the previous n data.

Therefore, we would have the proposed matrix:

SimpleWindowedSum, SimpleWindowedCount
CumulativeSum, CumulativeCount

Again, all these proposed names are less pithy than we might wish, but
the whole point of this exercise is to demystify and disambiguate
them. It seems like the discussion so far illustrates the futility of
trying to find names that are both short and descriptive.

How does that sound?
-John

On Tue, Jul 16, 2019 at 4:43 PM Matthias J. Sax <matthias@confluent.io> wrote:
>
> It's a fair point that Ryanne raises. However, "running sum" is the same
> as "moving sum" from my understanding.
>
> The issue is still, that `Sum` and `Count` which seem to be the cleanest
> names cannot be used. While I agree that `TotalSum` and `TotalCount` is
> somewhat redundant, I still think it the best suggestion so far.
>
> For the "sampled" version, I am personally fine with either `MovingXxx`,
> `WindowedXxx`, or `RunningXxx` -- to me, that are all equally good to
> describe the semantics.
>
>
>
> -Matthias
>
>
>
> On 7/16/19 2:25 PM, Sophie Blee-Goldman wrote:
> > I'm +1 on Windowed, was about to suggest that as I was catching up on the
> > discussion but Bill beat me to it :)
> >
> > On Tue, Jul 16, 2019 at 2:23 PM Bill Bejeck <bbejeck@gmail.com> wrote:
> >
> >> Hi John,
> >>
> >> Thanks for the updates.
> >>
> >> I like RunningCount and RunningSum.
> >>
> >> What about WindowedCount, WindowedSum instead of Moving?
> >> I'm just throwing this out there as Windowed seems more intuitive to me,
> >> but I'm not married to the idea.
> >>
> >> -Bill
> >>
> >> On Tue, Jul 16, 2019 at 5:09 PM John Roesler <john@confluent.io> wrote:
> >>
> >>> No worries! Choosing good public API names is a high-impact design
> >>> activity.
> >>>
> >>> Matthias, Bruno, Bill, and Stanislav,
> >>>
> >>> You've all contributed to this discussion or the vote so far... How do
> >>> you feel about the proposed name change:
> >>>
> >>> MovingCount, MovingSum   (instead of Sampled)
> >>> RunningCount, RunningSum   (Instead of Total)
> >>>
> >>> Thanks,
> >>> -John
> >>>
> >>> On Tue, Jul 16, 2019 at 3:04 PM Ryanne Dolan <ryannedolan@gmail.com>
> >>> wrote:
> >>>>
> >>>> John, that makes sense to me. Sorry for the bikeshedding.
> >>>>
> >>>> Ryanne
> >>>>
> >>>> On Tue, Jul 16, 2019 at 12:49 PM John Roesler <john@confluent.io>
> >> wrote:
> >>>>
> >>>>> Thanks for the explanation and the suggestion, Ryanne,
> >>>>>
> >>>>> I went with "sampled" just because these are instances of
> >> SampledStat,
> >>>>> which in the Kafka Metrics ecosystem are computed from a window
of
> >>>>> recent samples. Thinking more about it, the fact that they are
> >> sampled
> >>>>> and the fact that they are windowed are orthogonal, which is what
> >>>>> you're pointing out... sampling by itself doesn't indicate that
it's
> >> a
> >>>>> moving average.
> >>>>>
> >>>>> Since there is no way in Kafka Metrics for a metric to be sampled
and
> >>>>> not windowed/moving/decaying, calling them Sampled would never be
> >>>>> incorrect. But to someone unfamiliar with the code, it wouldn't
> >>>>> immediately suggest the behavior of the metric that actually matters.
> >>>>> That is, the behavior that distinguishes the two classes of metrics
> >> we
> >>>>> want to disambiguate here.
> >>>>>
> >>>>> It sounds like you'd suggest a new matrix of names:
> >>>>> MovingCount, MovingSum
> >>>>> RunningCount, RunningSum
> >>>>>
> >>>>> Are these names unambiguous and self explanatory?
> >>>>>
> >>>>> Thanks,
> >>>>> -John
> >>>>>
> >>>>> On Tue, Jul 16, 2019 at 12:32 PM Ryanne Dolan <ryannedolan@gmail.com
> >>>
> >>>>> wrote:
> >>>>>>
> >>>>>>> measurements, which decay/expire over time
> >>>>>>
> >>>>>> Thanks John for the clarification. This was my (re-)reading
of the
> >>> code,
> >>>>>> but this is not what I think of when I hear "sampled". In fact,
> >>> you'll
> >>>>>> notice that the Wikipedia pages for "Sample (statistics)" and
> >> "Sample
> >>>>>> (signal processing)" do not contain the words decay, expire,
> >> recent,
> >>>>>> history, or anything similar.
> >>>>>>
> >>>>>> Similar to "running", I'd suggest the more correct "moving",
as in
> >>>>> "moving
> >>>>>> average" and "moving sum", which involve looking back N samples,
> >>>>> applying a
> >>>>>> sliding window, decaying over time etc.
> >>>>>>
> >>>>>> Ryanne
> >>>>>>
> >>>>>> On Tue, Jul 16, 2019, 11:58 AM John Roesler <john@confluent.io>
> >>> wrote:
> >>>>>>
> >>>>>>> Thanks for raising this concern, Ryanne,
> >>>>>>>
> >>>>>>> "Sampled" indicates that the metrics is sampled, namely
that we
> >>>>>>> maintain a set of samples from recent value measurements,
which
> >>>>>>> decay/expire over time. So, the metric value is only
> >>> representative of
> >>>>>>> the recent past.
> >>>>>>>
> >>>>>>> "Total" indicates that the metric value contains all the
> >>> information
> >>>>>>> from the creation of the metric. For example., the total
sum
> >> would
> >>>>>>> include all measurements since the app started up.
> >>>>>>>
> >>>>>>> It seems like your concern is that the word "total" doesn't
> >> really
> >>>>>>> pinpoint this meaning, which is true. It's especially confusing
> >>> that
> >>>>>>> another meaning of "total" is synonymous with "sum", rendering
> >> the
> >>>>>>> name "TotalSum" sort of absurd.
> >>>>>>>
> >>>>>>> We previously considered "cumulative", which was rejected
as a
> >>>>>>> mouthful (it's four syllables) .
> >>>>>>>
> >>>>>>> You mentioned "running", which might be a more appropriate
> >> modifier
> >>>>>>> (RunningSum and RunningCount).
> >>>>>>>
> >>>>>>> What would everyone think about that?
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> -John
> >>>>>>>
> >>>>>>> On Tue, Jul 16, 2019 at 11:27 AM Ryanne Dolan <
> >>> ryannedolan@gmail.com>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>> John, I mentioned on the VOTE thread that the proposed
names
> >> are
> >>> a
> >>>>> bit
> >>>>>>>> confusing,
> >>>>>>>>
> >>>>>>>>> given that "sum", "total", and "count" are roughly
> >>> synonymous...
> >>>>>>>>
> >>>>>>>> In particular, TotalSum is, I think, a "running total",
though
> >>> the
> >>>>> naming
> >>>>>>>> doesn't really capture that.
> >>>>>>>>
> >>>>>>>> I think to avoid confusion, we should define exactly
what
> >>> "total" and
> >>>>>>>> "sampled" are supposed to indicate, and perhaps pick
> >> appropriate
> >>>>> naming
> >>>>>>>> from there.
> >>>>>>>>
> >>>>>>>> Ryanne
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Fri, Jul 12, 2019 at 1:42 PM John Roesler <
> >> john@confluent.io>
> >>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hey, thanks Matthias and Bruno,
> >>>>>>>>>
> >>>>>>>>> I agree, "Cumulative" is a mouthful. "TotalX" sounds
fine to
> >>> me.
> >>>>>>>>>
> >>>>>>>>> Also, yes, I would have liked to not have any modifier
for
> >>>>>>>>> "non-sampled", but there is a name conflict with
Sum.
> >>>>>>>>>
> >>>>>>>>> I'll update the KIP to reflect "TotalX" and then
start the
> >> vote
> >>>>> thread.
> >>>>>>>>>
> >>>>>>>>> Thanks again,
> >>>>>>>>> -John
> >>>>>>>>>
> >>>>>>>>> On Fri, Jul 12, 2019 at 11:27 AM Bruno Cadonna <
> >>> bruno@confluent.io
> >>>>>>
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> OK, makes sense. Then, I am in favour of TotalCount
and
> >>> TotalSum.
> >>>>>>>>>>
> >>>>>>>>>> Best,
> >>>>>>>>>> Bruno
> >>>>>>>>>>
> >>>>>>>>>> On Fri, Jul 12, 2019 at 12:57 AM Matthias J.
Sax <
> >>>>>>> matthias@confluent.io>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> `Sum` is an existing name, for the "sampled
sum" metric,
> >>> that
> >>>>> gets
> >>>>>>>>>>> deprecated. Hence, we cannot use it.
> >>>>>>>>>>>
> >>>>>>>>>>> If we cannot use `Sum` and use `TotalSum`,
we should also
> >>> not
> >>>>> use
> >>>>>>>>>>> `Count` but `TotalCount` for consistency.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> -Matthias
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On 7/11/19 12:58 PM, Bruno Cadonna wrote:
> >>>>>>>>>>>> Hi John,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thank you for the KIP.
> >>>>>>>>>>>>
> >>>>>>>>>>>> LGTM
> >>>>>>>>>>>>
> >>>>>>>>>>>> I also do not like CumulativeSum/Count
so much. I
> >>> propose to
> >>>>> just
> >>>>>>>>> call
> >>>>>>>>>>>> it Sum and Count.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I understand that you want to unequivocally
distinguish
> >>> the
> >>>>> two
> >>>>>>>>> metric
> >>>>>>>>>>>> functions by their names, but I have
the feeling the
> >>> names
> >>>>> become
> >>>>>>>>>>>> artificially complex. The exact semantics
can also be
> >>>>> documented
> >>>>>>> in
> >>>>>>>>>>>> the javadocs, which btw could also be
improved in those
> >>>>> classes.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best,
> >>>>>>>>>>>> Bruno
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, Jul 11, 2019 at 8:25 PM Matthias
J. Sax <
> >>>>>>>>> matthias@confluent.io> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks for the KIP. Overall LGTM.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The only though I have is, if we
may want to use
> >>> `TotalSum`
> >>>>> and
> >>>>>>>>>>>>> `TotalCount` instead of `CumulativeSum/Count`
as
> >> names?
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> -Matthias
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On 7/11/19 9:31 AM, John Roesler
wrote:
> >>>>>>>>>>>>>> Hi Kafka devs,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'd like to propose KIP-488
as a minor cleanup of
> >> some
> >>> of
> >>>>> our
> >>>>>>>>> metric
> >>>>>>>>>>>>>> implementations.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> KIP-488:
> >> https://cwiki.apache.org/confluence/x/kkAyBw
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Over time, iterative updates
to these metrics has
> >>> resulted
> >>>>> in a
> >>>>>>>>> pretty
> >>>>>>>>>>>>>> confusing little collection
of classes, and I've
> >>> personally
> >>>>>>> been
> >>>>>>>>>>>>>> involved in three separate moderately
time-consuming
> >>>>>>> iterations of
> >>>>>>>>> me
> >>>>>>>>>>>>>> or someone else trying to work
out which metrics are
> >>>>>>> available, and
> >>>>>>>>>>>>>> which ones are desired for a
given use case. One of
> >>> these
> >>>>> was
> >>>>>>>>> actually
> >>>>>>>>>>>>>> a long-running bug in Kafka
Streams' metrics, so not
> >>> only
> >>>>> has
> >>>>>>> this
> >>>>>>>>>>>>>> confusion been a time sink,
but it has also led to
> >>> bugs.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I'm hoping this change won't
be too controversial.
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>>> -John
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>
> >>
> >
>

Mime
View raw message