From user-return-20338-archive-asf-public=cust-asf.ponee.io@flink.apache.org Mon Jun 4 10:45:25 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 10FF5180636 for ; Mon, 4 Jun 2018 10:45:24 +0200 (CEST) Received: (qmail 52748 invoked by uid 500); 4 Jun 2018 08:45:19 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@flink.apache.org Received: (qmail 52737 invoked by uid 99); 4 Jun 2018 08:45:18 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Jun 2018 08:45:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 01623180719 for ; Mon, 4 Jun 2018 08:45:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DQ1YGBFnuf7a for ; Mon, 4 Jun 2018 08:45:15 +0000 (UTC) Received: from mail-lf0-f47.google.com (mail-lf0-f47.google.com [209.85.215.47]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id DFC965F288 for ; Mon, 4 Jun 2018 08:45:14 +0000 (UTC) Received: by mail-lf0-f47.google.com with SMTP id o9-v6so23889375lfk.1 for ; Mon, 04 Jun 2018 01:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=kDuM5CJSRfwxU/ek6BUNLXrwQxF7ni/fMpfcCv2Netc=; b=u+Gu40QowwwwYKoqmjsd5VoP863/vSkulO5cWlyCJ8xhkEdkU3CkJvFE1VBumHM9Di Go3bzaXkftQ3IYcs1/0O0SR009p92ikiaBrzSIn4WXxUomjb7Iv9tgiAwM8BezDHgg5B UVNS+e2aJmgAfd0fkU1q0KxVuXPWv8cNDsjCY7UFSqFizp5cAf/D2Z4pYqLJeCNYGDge mdOCjHe1ygdMHn7Mk/CcvV0wLsZDyPUQS98qS6FJpTzc+4czEGOtub8B/flsYlYZYteD qIHb8ZsSjKj6ln8MioE1+ydRt2LVI1LPJ7CRCKQUPKpw1/tK4xfrdzUocmX4ACotgEyT a/sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=kDuM5CJSRfwxU/ek6BUNLXrwQxF7ni/fMpfcCv2Netc=; b=E/S34YNyH+WYrby5BmX+eXeoIszEtIF7EvFtrj/F2FQdv8uGOyFg/OWJetPJgJiARq DhFotIycPPtGdcLFG8dY4PTrKooPj0izUhMQqjBI9FU22uos3lCYN1LOC7b/7VzHuP84 QyfKpy5yD30XGPdtENFzbJQ/mWeB8ezvxYpgc1xObPqPqfnvH3wLTJtx3q1ACjInCB4i O0EGJkZlcyZjhycHI7N3KPTMb6EUUN8ByS3/BlEDZigkajgpmJarNC4BovKtdPsAJVv/ bxUJ7KufvmEVkkcqr3NG09nNqrNRKL1UCX88Cyf5qhciA+2X+KcqBD/QgHr/jMICmsE6 u6KQ== X-Gm-Message-State: ALKqPwdei15LBR9fxr+RF7Kg8be9zevx5ezMuFHXI3R0HZQ6HnfqmFgW l+KGbkwGXTYkONEINpmSHnAJNcfxju+CFXvFxtI= X-Google-Smtp-Source: ADUXVKJEl0vPpvM/nA5/RPxPimgbbdFmLgwvmUX7nypLAVl/z+pfSrwVWcuJ+HnqLNB6eidi7KWRZDGYdblFUacxXy4= X-Received: by 2002:a2e:60:: with SMTP id 93-v6mr11699063lja.96.1528101913692; Mon, 04 Jun 2018 01:45:13 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a2e:2415:0:0:0:0:0 with HTTP; Mon, 4 Jun 2018 01:44:33 -0700 (PDT) In-Reply-To: References: From: Fabian Hueske Date: Mon, 4 Jun 2018 10:44:33 +0200 Message-ID: Subject: Re: JVM metrics disappearing after job crash, restart To: Nikolas Davis Cc: Ajay Tripathy , user Content-Type: multipart/alternative; boundary="000000000000bf1bf8056dccf3ca" --000000000000bf1bf8056dccf3ca Content-Type: text/plain; charset="UTF-8" Hi Nik, Can you have a look at this JIRA ticket [1] and check if it is related to the problems your are facing? If so, would you mind leaving a comment there? Thank you, Fabian [1] https://issues.apache.org/jira/browse/FLINK-8946 2018-05-31 4:41 GMT+02:00 Nikolas Davis : > We keep track of metrics by using the value of MetricGroup::getMetricIdentifier, > which returns the fully qualified metric name. The query that we use to > monitor metrics filters for metrics IDs that match '%Status.JVM.Memory%'. > As long as the new metrics come online via the MetricReporter interface > then I think the chart would be continuous; we would just see the old JVM > memory metrics cycle into new metrics. > > Nik Davis > Software Engineer > New Relic > > On Wed, May 30, 2018 at 5:30 PM, Ajay Tripathy wrote: > >> How are your metrics dimensionalized/named? Task managers often have UIDs >> generated for them. The task id dimension will change on restart. If you >> name your metric based on this 'task_id' there would be a discontinuity >> with the old metric. >> >> On Wed, May 30, 2018 at 4:49 PM, Nikolas Davis >> wrote: >> >>> Howdy, >>> >>> We are seeing our task manager JVM metrics disappear over time. This >>> last time we correlated it to our job crashing and restarting. I wasn't >>> able to grab the failing exception to share. Any thoughts? >>> >>> We track metrics through the MetricReporter interface. As far as I can >>> tell this more or less only affects the JVM metrics. I.e. most / all other >>> metrics continue reporting fine as the job is automatically restarted. >>> >>> Nik Davis >>> Software Engineer >>> New Relic >>> >> >> > --000000000000bf1bf8056dccf3ca Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Nik,

Can you have a lo= ok at this JIRA ticket [1] and check if it is related to the problems your = are facing?
If so, would you mind leaving a comment there?

=
Thank you,
Fabian

2018-05-31 4:41 GMT+02:00 Nikolas Davis <ndavis@newrelic.com>:
We keep track of metrics by using the value of MetricGroup::<= wbr>getMetricIdentifier, which returns the fully qualified metric name. The= query that we use to monitor metrics filters for metrics IDs that match=C2= =A0'%Status.JVM.Memory%'. As long as the new metrics come online vi= a the MetricReporter interface then I think the chart would be continuous; = we would just see the old JVM memory metrics cycle into new metrics.

Nik Davis
Sof= tware Engineer
New Relic

On Wed, May 30= , 2018 at 5:30 PM, Ajay Tripathy <ajayt@yelp.com> wrote:
How are your metrics dimension= alized/named? Task managers often have UIDs generated for them. The task id= dimension will change on restart. If you name your metric based on this &#= 39;task_id' there would be a discontinuity with the old metric.

On Wed, May = 30, 2018 at 4:49 PM, Nikolas Davis <ndavis@newrelic.com> w= rote:
Howdy,

We are seeing our task manager JVM metrics disappear over time. This la= st time we correlated it to our job crashing and restarting. I wasn't a= ble to grab the failing exception to share. Any thoughts?

=
= We track metrics through the MetricReporter interface. As far as I can tell= this more or less only affects the JVM metrics. I.e. most / all other metr= ics continue reporting fine as the job is automatically restarted.
<= div dir=3D"ltr">
Nik Davis
S= oftware Engineer
New Relic



--000000000000bf1bf8056dccf3ca--