From dev-return-8992-archive-asf-public=cust-asf.ponee.io@beam.apache.org Fri Apr 6 20:53:53 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 2AF37180649 for ; Fri, 6 Apr 2018 20:53:52 +0200 (CEST) Received: (qmail 248 invoked by uid 500); 6 Apr 2018 18:53:51 -0000 Mailing-List: contact dev-help@beam.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@beam.apache.org Delivered-To: mailing list dev@beam.apache.org Received: (qmail 238 invoked by uid 99); 6 Apr 2018 18:53:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Apr 2018 18:53:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id E025E180736 for ; Fri, 6 Apr 2018 18:53:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.869 X-Spam-Level: * X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=google.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id CxpnrHlwB7fN for ; Fri, 6 Apr 2018 18:53:47 +0000 (UTC) Received: from mail-ua0-f182.google.com (mail-ua0-f182.google.com [209.85.217.182]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 019085F5AF for ; Fri, 6 Apr 2018 18:53:46 +0000 (UTC) Received: by mail-ua0-f182.google.com with SMTP id q26so1330216uab.0 for ; Fri, 06 Apr 2018 11:53:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=55hD5IC8zbxfOsIStuF3iIjReSqypntO3XjkSDD4+aw=; b=oWkwWMYTkdtjtyA/vQA+sXqWNwXfyWtR9iomzDZtfGEcaVoI/edDy1MxVsfNa2vRuR 2jW00+AlNJbpgb0p8LQlRZf40AJJ98QP8RAluQCcYM12eKvJ09RfA68856iDB8faREaM g9lE9vu1p9j8iIs5ZFIxGojWbK+OivTyR5JkAE+vPxHHk/LZw6OfMg/Lw/KBTkOC6tTP PwtTpx19Rp3evA9mq1P+U8u4lwsyNXGvtdhCMIpySs5agSKrw+SnaD+xD5tyOy8pXN4F QzckojqDzfHuJLcDiSZP/rwnsyCw4w9kEqr/uLTrifoydyZymoirIXrXDHtn4SJCC671 FTqQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=55hD5IC8zbxfOsIStuF3iIjReSqypntO3XjkSDD4+aw=; b=FHwPDlqHBs5HMZDwVAjT0LMqx+M7hBv9k3Qdf7H8G4dKkFaJ5LN44xPVG9VqO3Xehq ANH+R7h1eAM1MxcCwJY/uJqdfdmN5AV5QLeTjAl435ww/TPyKR0e+4gHw3nH7QYS8UvL eTYdX0RPE5oOPJ32P/6Zu4zqyTw22Z3OgwHHVkS5yGF1JAVR/qWdpcNCdjM8OA03CiO5 m4+sWTXAxN6pozVXAhY4JOc2COCblqOcuiFCEA2LTPvaLhaek/olzQRukEbI+IarMJiZ 45HEBfE3qv727GifW8Vmo1aMEsbjfjI+D0Fr4vsAipyZ+nD7470h8qXkbRVc9W+dnlQw 7HAw== X-Gm-Message-State: ALQs6tB8TZ/MySl4wHWq1iL04dyiaFD+JoWIy2ldPomRhdvp276Kl6Kl NVCarF+XzS9eFWQvx7I9UaGQpG8PdcAI7FHJNOVkXMAnzMA= X-Google-Smtp-Source: AIpwx49e0WBYe+pnLRd67rZzYmVeoNI8r5yPJMJU9kTZvg6VTwoYS3eZt9D8iKexhL/8pQj/opVu3Dl3dl/rdugVIBo= X-Received: by 10.176.69.182 with SMTP id u51mr17799102uau.93.1523040826099; Fri, 06 Apr 2018 11:53:46 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Kenneth Knowles Date: Fri, 06 Apr 2018 18:53:35 +0000 Message-ID: Subject: Re: About the Gauge metric API To: dev Cc: bchambers@apache.org, Andrea Foegler Content-Type: multipart/alternative; boundary="94eb2c11b8246ba17c0569329334" --94eb2c11b8246ba17c0569329334 Content-Type: text/plain; charset="UTF-8" In terms of natural language, I don't think "gauge" makes sense strings. A gauge measures a quantity. A string is not a quantity. So I like a separate API, like Robert says. Backends can go ahead and implement leaf String and Gauge collectors with the same data structure if they like. In implementation / reporting, it also may sometimes make sense to sum gauges, possibly within a limited scope. Since gauge updates aren't synchronized it would not be perfect but you could ballpark some quantity across the fleet. This wouldn't make sense for strings. Just another consequence of the fact that gauges measure a quantity/ On Fri, Apr 6, 2018 at 11:39 AM Bill Neubauer wrote: > Thanks for unraveling those themes, Pablo! > > 1. Seems reasonable in light of behaviors metrics backends support. > 2. Those same backends support histogramming of data, so having integer > types is very useful. > 3. I believe that is the case, for the reasons I mentioned earlier, Gauges > should only clobber previously values reported by the same entity. Two > workers with the same gauge should not be overwriting each other's values, > only their own. This implies per-worker segmentation. > > > On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada wrote: > >> Nobody wants to get rid of Gauges. I see that we have three separate >> themes being discussed here, and I think it's useful to point them out and >> address them independently: >> >> 1. Whether Gauges should change to hold string values. >> 2. If Gauges are to support string values, whether Gauges should also >> continue to have an int API. >> 3. Whether Beam should support some sort of label / tag / worker-id for >> aggregation of Gauges (maybe other metrics?) >> >> -P. >> >> On Fri, Apr 6, 2018 at 11:21 AM Ben Chambers >> wrote: >> >>> Gauges are incredibly useful for exposing the current state of the >>> system. For instance, number of elements in a queue, current memory usage, >>> number of RPCs in flight, etc. As mentioned above, these concepts exist in >>> numerous systems for monitoring distributed environments, including >>> Stackdriver Monitoring. The key to making them work is the addition of >>> labels or tags, which as an aside are also useful for *all* metric types, >>> not just Gauges. >>> >>> If Beam gets rid of Gauges, how would we go about exporting "current" >>> values like memory usage, RPCs in flight, etc.? >>> >>> -- Ben >>> >>> On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles wrote: >>> >>>> Just naively - the use cases that Gauge addresses seem relevant, and >>>> the information seems feasible to gather and present. The bit that doesn't >>>> seem to make sense is aggregating gauges by clobbering each other. So I >>>> think that's just +1 Ben? >>>> >>>> On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi >>>> wrote: >>>> >>>>> I am not opposed to removing other data types, though they are extra >>>>> convenience for user. >>>>> >>>>> In Scott's example above, if the metric is a counter, what are the >>>>> guarantees provided? E.g. would it match the global count using GBK? If >>>>> yes, then gauges (especially per-key gauges) can be very useful too (e.g. >>>>> backlog for each Kafka partition/split). >>>>> >>>>> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw >>>>> wrote: >>>>> >>>>>> A String API makes it clear(er) that the values will not be >>>>>> aggregated in any way across workers. I don't think retaining both APIs >>>>>> (except for possibly some short migration period) worthwhile. On another >>>>>> note, I still find the distributed gague API to be a bit odd in general. >>>>>> >>>>>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi >>>>>> wrote: >>>>>> >>>>>>> I would be in favor of replacing the existing Gauge.set(long) API >>>>>>>> with the String version and removing the old one. This would be a breaking >>>>>>>> change. However this is a relatively new API and is still marked >>>>>>>> @Experimental. Keeping the old API would retain the potential confusion. >>>>>>>> It's better to simplify the API surface: having two APIs makes it less >>>>>>>> clear which one users should choose. >>>>>>> >>>>>>> >>>>>>> Supporting additional data types sounds good. But the above states >>>>>>> string API will replace the existing API. I do not see how string API makes >>>>>>> the semantics more clear. Semantically both are same to the user. >>>>>>> >>>>>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada >>>>>>> wrote: >>>>>>> >>>>>>>> Hi Ben : D >>>>>>>> >>>>>>>> Sure, that's reasonable. And perhaps I started the discussion in >>>>>>>> the wrong direction. I'm not questioning the utility of Gauge metrics. >>>>>>>> >>>>>>>> What I'm saying is that Beam only supports integers,, but Gauges >>>>>>>> are aggregated by dropping old values depending on their update times; so >>>>>>>> it might be desirable to not restrict the data type to just integers. >>>>>>>> >>>>>>>> -P. >>>>>>>> >>>>>>>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers >>>>>>>> wrote: >>>>>>>> >>>>>>>>> See for instance how gauge metrics are handled in Prometheus, >>>>>>>>> Datadog and Stackdriver monitoring. Gauges are perfect for use in >>>>>>>>> distributed systems, they just need to be properly labeled. Perhaps we >>>>>>>>> should apply a default tag or allow users to specify one. >>>>>>>>> >>>>>>>>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Some metrics backend label the value, for instance with the >>>>>>>>>> worker that sent it. Then the aggregation is latest per label. This makes >>>>>>>>>> it useful for holding values such as "memory usage" that need to hold >>>>>>>>>> current value. >>>>>>>>>> >>>>>>>>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner >>>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>>> +1 on the proposal to support a "String" gauge. >>>>>>>>>>> >>>>>>>>>>> To expand a bit, the current API doesn't make it clear that the >>>>>>>>>>> gauge value is based on local state. If a runner chooses to parallelize a >>>>>>>>>>> DoFn across many workers, each worker will have its own local Gauge metric >>>>>>>>>>> and its updates will overwrite other values. For example, from the API it >>>>>>>>>>> looks like you could use a gauge to implement your own element count metric: >>>>>>>>>>> >>>>>>>>>>> long count = 0; >>>>>>>>>>> @ProcessElement >>>>>>>>>>> public void processElement(ProcessContext c) { >>>>>>>>>>> myGauge.set(++count); >>>>>>>>>>> c.output(c.element()); >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> This looks correct, but each worker has their own local 'count' >>>>>>>>>>> field, and gauge metric updates from parallel workers will overwrite each >>>>>>>>>>> other rather than get aggregated. So the final value would be "the number >>>>>>>>>>> of elements processed on one of the workers". (The correct implementation >>>>>>>>>>> uses a Counter metric). >>>>>>>>>>> >>>>>>>>>>> I would be in favor of replacing the existing Gauge.set(long) >>>>>>>>>>> API with the String version and removing the old one. This would be a >>>>>>>>>>> breaking change. However this is a relatively new API and is still marked >>>>>>>>>>> @Experimental. Keeping the old API would retain the potential confusion. >>>>>>>>>>> It's better to simplify the API surface: having two APIs makes it less >>>>>>>>>>> clear which one users should choose. >>>>>>>>>>> >>>>>>>>>>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hello all, >>>>>>>>>>>> As I was working on adding support for Gauges in Dataflow, some >>>>>>>>>>>> noted that Gauge is a fairly unusual kind of metric for a distributed >>>>>>>>>>>> environment, since many workers will report different values and stomp on >>>>>>>>>>>> each other's all the time. >>>>>>>>>>>> >>>>>>>>>>>> We also looked at Flink and Dropwizard Gauge metrics [1][2], >>>>>>>>>>>> and we found that these use generics, and Flink explicitly mentions that a >>>>>>>>>>>> toString implementation is required[3]. >>>>>>>>>>>> >>>>>>>>>>>> With that in mind, I'm thinking that it might make sense to 1) >>>>>>>>>>>> expand Gauge to support string values (keep int-based API for backwards >>>>>>>>>>>> compatibility), and migrate it to use string behind the covers. >>>>>>>>>>>> >>>>>>>>>>>> What does everyone think about this? >>>>>>>>>>>> >>>>>>>>>>>> Best >>>>>>>>>>>> -P. >>>>>>>>>>>> >>>>>>>>>>>> 1 - >>>>>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types >>>>>>>>>>>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges >>>>>>>>>>>> 3 - >>>>>>>>>>>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge >>>>>>>>>>>> JIRA issue for Gauge metrics - >>>>>>>>>>>> https://issues.apache.org/jira/browse/BEAM-1616 >>>>>>>>>>>> -- >>>>>>>>>>>> Got feedback? go/pabloem-feedback >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Got feedback? http://go/swegner-feedback >>>>>>>>>>> >>>>>>>>>> -- >>>>>>>> Got feedback? go/pabloem-feedback >>>>>>>> >>>>>>>> >>>>>>> -- >> Got feedback? go/pabloem-feedback >> >> > --94eb2c11b8246ba17c0569329334 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
In terms of n= atural language, I don't think "gauge" makes sense strings. A= gauge measures a quantity. A string is not a quantity. So I like a separat= e API, like Robert says. Backends can go ahead and implement leaf String an= d Gauge collectors with the same data structure if they like.

In implementation / reporting, it als= o may sometimes make sense to sum gauges, possibly within a limited scope. = Since gauge updates aren't synchronized it would not be perfect but you= could ballpark some quantity across the fleet. This wouldn't make sens= e for strings. Just another consequence of the fact that gauges measure a q= uantity/

On Fri,= Apr 6, 2018 at 11:39 AM Bill Neubauer <wcn@google.com> wrote:
Thanks for unraveling those themes, Pablo!

1. Seems reasonable in light of behaviors metrics backends support.
<= div>2. Those same backends support histogramming of data, so having integer= types is very useful.
3. I believe that is the case, for the rea= sons I mentioned earlier, Gauges should only clobber previously values repo= rted by the same entity. Two workers with the same gauge should not be over= writing each other's values, only their own. This implies per-worker se= gmentation.


On Fri, Apr 6, 2018 at 11:35 AM Pablo Estrada <pabloem@google.com> wrote:=
Nobody wants to g= et rid of Gauges. I see that we have three separate themes being discussed = here, and I think it's useful to point them out and address them indepe= ndently:

1. Whether Gauges should change to hold string = values.
2. If Gauges are to support string values, whether Gauges= should also continue to have an int API.
3. Whether Beam should = support some sort of label / tag / worker-id for aggregation of Gauges (may= be other metrics?)

-P.

On Fri, Apr 6, 2018 at 11:21 AM Ben Chamb= ers <bchambers= @apache.org> wrote:
Gauges are incredibly useful for exposing the current state of the= system. For instance, number of elements in a queue, current memory usage,= number of RPCs in flight, etc. As mentioned above, these concepts exist in= numerous systems for monitoring distributed environments, including Stackd= river Monitoring. The key to making them work is the addition of labels or = tags, which as an aside are also useful for *all* metric types, not just Ga= uges.

If Beam gets rid of Gauges, how would we go about = exporting "current" values like memory usage, RPCs in flight, etc= .?

-- Ben

On Fri, Apr 6, 2018 at 11:13 AM Kenneth Knowles <klk@google.com> wro= te:
Just naive= ly - the use cases that Gauge addresses seem relevant, and the information = seems feasible to gather and present. The bit that doesn't seem to make= sense is aggregating gauges by clobbering each other. So I think that'= s just +1 Ben?

On Fri,= Apr 6, 2018 at 10:26 AM Raghu Angadi <rangadi@google.com> wrote:
I am not opposed to removing ot= her data types, though they are extra convenience for user.

<= /div>In Scott's example above, if the metric is a counter, what are the= guarantees provided? E.g. would it match the global count using GBK? If ye= s, then gauges (especially per-key gauges) can be very useful too (e.g. bac= klog for each Kafka partition/split).

<= div dir=3D"ltr">On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw <robertwb@google.com&g= t; wrote:
A String= API makes it clear(er) that the values will not be aggregated in any way a= cross workers. I don't think retaining both APIs (except for possibly s= ome short migration period) worthwhile. On another note, I still find the d= istributed gague API to be a bit odd in general.=C2=A0

On Fri, Apr 6, 2018 at 9:46 AM Raghu Anga= di <rangadi@goog= le.com> wrote:
I = would be in favor of replacing the existing Gauge.set(long) API with the St= ring version and removing the old one. This would be a breaking change. How= ever this is a relatively new API and is still marked @Experimental. Keepin= g the old API would retain the potential confusion. It's better to simp= lify the API surface: having two APIs makes it less clear which one users s= hould choose.

Supporting additional = data types sounds good. But the above states string API will replace the ex= isting API. I do not see how string API makes the semantics more clear.=C2= =A0=C2=A0Semantically both are same to the user.=C2=A0

On Fri, Apr 6, 2018 at 9:31 AM Pablo= Estrada <pabloe= m@google.com> wrote:
Hi Ben : D

Sure, that's reasonable. And pe= rhaps I started the discussion in the wrong direction. I'm not question= ing the utility of Gauge metrics.=C2=A0

What I'= ;m saying is that Beam only supports integers,, but Gauges are aggregated b= y dropping old values depending on their update times; so it might be desir= able to not restrict the data type to just integers.

-P.

On F= ri, Apr 6, 2018 at 9:19 AM Ben Chambers <bchambers@apache.org> wrote:
See for instance how gauge metrics are ha= ndled in Prometheus, Datadog and Stackdriver monitoring. Gauges are perfect= for use in distributed systems, they just need to be properly labeled. Per= haps we should apply a default tag or allow users to specify one.
On Fri, Apr 6, 2018, 9:14 = AM Ben Chambers <bchambers@apache.org> wrote:
Some metrics backend label the value, for instance with the worker that= sent it. Then the aggregation is latest per label. This makes it useful fo= r holding values such as "memory usage" that need to hold current= value.

On Fri, Apr 6, 2= 018, 9:00 AM Scott Wegner <swegner@google.com> wrote:
+1 on the proposal to support a "String&qu= ot; gauge.=C2=A0

To expand a bit, the current API doesn&= #39;t make it clear that the gauge value is based on local state. If a runn= er chooses to parallelize a DoFn across many workers, each worker will have= its own local Gauge metric and its updates will overwrite other values. Fo= r example, from the API it looks like you could use a gauge to implement yo= ur own element count metric:

long count =3D 0;
@ProcessElement
public void processElement(ProcessContext = c) {
=C2=A0 myGauge.set(++count);
=C2=A0 c.output(c.ele= ment());
}

This looks correct, but each = worker has their own local 'count' field, and gauge metric updates = from parallel workers will overwrite each other rather than get aggregated.= So the final value would be "the number of elements processed on one = of the workers". (The correct implementation uses a Counter metric).

I would be in favor of replacing the existing Gauge= .set(long) API with the String version and removing the old one. This would= be a breaking change. However this is a relatively new API and is still ma= rked @Experimental. Keeping the old API would retain the potential confusio= n. It's better to simplify the API surface: having two APIs makes it le= ss clear which one users should choose.

On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada <pabloem@google.com> wrote:
Hello= all,
As I was working on adding support for Gauges in Dataflow, some n= oted that Gauge is a fairly unusual kind of metric for a distributed enviro= nment, since many workers will report different values and stomp on each ot= her's all the time.

We also looked at Flink an= d Dropwizard Gauge metrics [1][2], and we found that these use generics, an= d Flink explicitly mentions that a toString implementation is required[3].<= /div>

With that in mind, I'm thinking that it might = make sense to 1) expand Gauge to support string values (keep int-based API = for backwards compatibility), and migrate it to use string behind the cover= s.

What does everyone think about this?
=
Best
-P.

--
Got feedback? go/pabloem-feedback
--
--
Got f= eedback? go/pabloem-feedback
--
Got feedback? go/pabloem-feedb= ack
--94eb2c11b8246ba17c0569329334--