flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@gmail.com>
Subject Re: JVM metrics disappearing after job crash, restart
Date Mon, 04 Jun 2018 08:44:33 GMT
Hi Nik,

Can you have a look at this JIRA ticket [1] and check if it is related to
the problems your are facing?
If so, would you mind leaving a comment there?

Thank you,

[1] https://issues.apache.org/jira/browse/FLINK-8946

2018-05-31 4:41 GMT+02:00 Nikolas Davis <ndavis@newrelic.com>:

> We keep track of metrics by using the value of MetricGroup::getMetricIdentifier,
> which returns the fully qualified metric name. The query that we use to
> monitor metrics filters for metrics IDs that match '%Status.JVM.Memory%'.
> As long as the new metrics come online via the MetricReporter interface
> then I think the chart would be continuous; we would just see the old JVM
> memory metrics cycle into new metrics.
> Nik Davis
> Software Engineer
> New Relic
> On Wed, May 30, 2018 at 5:30 PM, Ajay Tripathy <ajayt@yelp.com> wrote:
>> How are your metrics dimensionalized/named? Task managers often have UIDs
>> generated for them. The task id dimension will change on restart. If you
>> name your metric based on this 'task_id' there would be a discontinuity
>> with the old metric.
>> On Wed, May 30, 2018 at 4:49 PM, Nikolas Davis <ndavis@newrelic.com>
>> wrote:
>>> Howdy,
>>> We are seeing our task manager JVM metrics disappear over time. This
>>> last time we correlated it to our job crashing and restarting. I wasn't
>>> able to grab the failing exception to share. Any thoughts?
>>> We track metrics through the MetricReporter interface. As far as I can
>>> tell this more or less only affects the JVM metrics. I.e. most / all other
>>> metrics continue reporting fine as the job is automatically restarted.
>>> Nik Davis
>>> Software Engineer
>>> New Relic

View raw message