beam-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kenneth Knowles <...@google.com>
Subject Re: About the Gauge metric API
Date Fri, 06 Apr 2018 18:13:17 GMT
Just naively - the use cases that Gauge addresses seem relevant, and the
information seems feasible to gather and present. The bit that doesn't seem
to make sense is aggregating gauges by clobbering each other. So I think
that's just +1 Ben?

On Fri, Apr 6, 2018 at 10:26 AM Raghu Angadi <rangadi@google.com> wrote:

> I am not opposed to removing other data types, though they are extra
> convenience for user.
>
> In Scott's example above, if the metric is a counter, what are the
> guarantees provided? E.g. would it match the global count using GBK? If
> yes, then gauges (especially per-key gauges) can be very useful too (e.g.
> backlog for each Kafka partition/split).
>
> On Fri, Apr 6, 2018 at 10:01 AM Robert Bradshaw <robertwb@google.com>
> wrote:
>
>> A String API makes it clear(er) that the values will not be aggregated in
>> any way across workers. I don't think retaining both APIs (except for
>> possibly some short migration period) worthwhile. On another note, I still
>> find the distributed gague API to be a bit odd in general.
>>
>> On Fri, Apr 6, 2018 at 9:46 AM Raghu Angadi <rangadi@google.com> wrote:
>>
>>> I would be in favor of replacing the existing Gauge.set(long) API with
>>>> the String version and removing the old one. This would be a breaking
>>>> change. However this is a relatively new API and is still marked
>>>> @Experimental. Keeping the old API would retain the potential confusion.
>>>> It's better to simplify the API surface: having two APIs makes it less
>>>> clear which one users should choose.
>>>
>>>
>>> Supporting additional data types sounds good. But the above states
>>> string API will replace the existing API. I do not see how string API makes
>>> the semantics more clear.  Semantically both are same to the user.
>>>
>>> On Fri, Apr 6, 2018 at 9:31 AM Pablo Estrada <pabloem@google.com> wrote:
>>>
>>>> Hi Ben : D
>>>>
>>>> Sure, that's reasonable. And perhaps I started the discussion in the
>>>> wrong direction. I'm not questioning the utility of Gauge metrics.
>>>>
>>>> What I'm saying is that Beam only supports integers,, but Gauges are
>>>> aggregated by dropping old values depending on their update times; so it
>>>> might be desirable to not restrict the data type to just integers.
>>>>
>>>> -P.
>>>>
>>>> On Fri, Apr 6, 2018 at 9:19 AM Ben Chambers <bchambers@apache.org>
>>>> wrote:
>>>>
>>>>> See for instance how gauge metrics are handled in Prometheus, Datadog
>>>>> and Stackdriver monitoring. Gauges are perfect for use in distributed
>>>>> systems, they just need to be properly labeled. Perhaps we should apply
a
>>>>> default tag or allow users to specify one.
>>>>>
>>>>> On Fri, Apr 6, 2018, 9:14 AM Ben Chambers <bchambers@apache.org>
>>>>> wrote:
>>>>>
>>>>>> Some metrics backend label the value, for instance with the worker
>>>>>> that sent it. Then the aggregation is latest per label. This makes
it
>>>>>> useful for holding values such as "memory usage" that need to hold
current
>>>>>> value.
>>>>>>
>>>>>> On Fri, Apr 6, 2018, 9:00 AM Scott Wegner <swegner@google.com>
wrote:
>>>>>>
>>>>>>> +1 on the proposal to support a "String" gauge.
>>>>>>>
>>>>>>> To expand a bit, the current API doesn't make it clear that the
>>>>>>> gauge value is based on local state. If a runner chooses to parallelize
a
>>>>>>> DoFn across many workers, each worker will have its own local
Gauge metric
>>>>>>> and its updates will overwrite other values. For example, from
the API it
>>>>>>> looks like you could use a gauge to implement your own element
count metric:
>>>>>>>
>>>>>>> long count = 0;
>>>>>>> @ProcessElement
>>>>>>> public void processElement(ProcessContext c) {
>>>>>>>   myGauge.set(++count);
>>>>>>>   c.output(c.element());
>>>>>>> }
>>>>>>>
>>>>>>> This looks correct, but each worker has their own local 'count'
>>>>>>> field, and gauge metric updates from parallel workers will overwrite
each
>>>>>>> other rather than get aggregated. So the final value would be
"the number
>>>>>>> of elements processed on one of the workers". (The correct implementation
>>>>>>> uses a Counter metric).
>>>>>>>
>>>>>>> I would be in favor of replacing the existing Gauge.set(long)
API
>>>>>>> with the String version and removing the old one. This would
be a breaking
>>>>>>> change. However this is a relatively new API and is still marked
>>>>>>> @Experimental. Keeping the old API would retain the potential
confusion.
>>>>>>> It's better to simplify the API surface: having two APIs makes
it less
>>>>>>> clear which one users should choose.
>>>>>>>
>>>>>>> On Fri, Apr 6, 2018 at 8:28 AM Pablo Estrada <pabloem@google.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hello all,
>>>>>>>> As I was working on adding support for Gauges in Dataflow,
some
>>>>>>>> noted that Gauge is a fairly unusual kind of metric for a
distributed
>>>>>>>> environment, since many workers will report different values
and stomp on
>>>>>>>> each other's all the time.
>>>>>>>>
>>>>>>>> We also looked at Flink and Dropwizard Gauge metrics [1][2],
and we
>>>>>>>> found that these use generics, and Flink explicitly mentions
that a
>>>>>>>> toString implementation is required[3].
>>>>>>>>
>>>>>>>> With that in mind, I'm thinking that it might make sense
to 1)
>>>>>>>> expand Gauge to support string values (keep int-based API
for backwards
>>>>>>>> compatibility), and migrate it to use string behind the covers.
>>>>>>>>
>>>>>>>> What does everyone think about this?
>>>>>>>>
>>>>>>>> Best
>>>>>>>> -P.
>>>>>>>>
>>>>>>>> 1 -
>>>>>>>> https://ci.apache.org/projects/flink/flink-docs-release-1.3/monitoring/metrics.html#metric-types
>>>>>>>> 2 - https://metrics.dropwizard.io/3.1.0/manual/core/#gauges
>>>>>>>> 3 -
>>>>>>>> https://github.com/apache/flink/blob/master/docs/monitoring/metrics.md#gauge
>>>>>>>> JIRA issue for Gauge metrics -
>>>>>>>> https://issues.apache.org/jira/browse/BEAM-1616
>>>>>>>> --
>>>>>>>> Got feedback? go/pabloem-feedback
>>>>>>>> <https://goto.google.com/pabloem-feedback>
>>>>>>>>
>>>>>>> --
>>>>>>>
>>>>>>>
>>>>>>> Got feedback? http://go/swegner-feedback
>>>>>>>
>>>>>> --
>>>> Got feedback? go/pabloem-feedback
>>>> <https://goto.google.com/pabloem-feedback>
>>>>
>>>

Mime
View raw message