Return-Path: X-Original-To: apmail-flink-user-archive@minotaur.apache.org Delivered-To: apmail-flink-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 32D4518763 for ; Thu, 12 Nov 2015 17:50:30 +0000 (UTC) Received: (qmail 31071 invoked by uid 500); 12 Nov 2015 17:50:30 -0000 Delivered-To: apmail-flink-user-archive@flink.apache.org Received: (qmail 30989 invoked by uid 500); 12 Nov 2015 17:50:30 -0000 Mailing-List: contact user-help@flink.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@flink.apache.org Delivered-To: mailing list user@flink.apache.org Received: (qmail 30979 invoked by uid 99); 12 Nov 2015 17:50:29 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 12 Nov 2015 17:50:29 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 894EC180A8B for ; Thu, 12 Nov 2015 17:50:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.879 X-Spam-Level: ** X-Spam-Status: No, score=2.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id 2pm4T7hUWgxp for ; Thu, 12 Nov 2015 17:50:28 +0000 (UTC) Received: from mail-wm0-f49.google.com (mail-wm0-f49.google.com [74.125.82.49]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id AE4E620F43 for ; Thu, 12 Nov 2015 17:50:27 +0000 (UTC) Received: by wmdw130 with SMTP id w130so164984334wmd.0 for ; Thu, 12 Nov 2015 09:50:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=LtPqOFRPhm2FpMAV2bvUnZNnwhT668YPEsgw/8bdc9c=; b=orrTDeGL1t0y7548n/ZK0hGPc7+DPUlaE0nY2ZVxcggum9umzm5OvW7FUYa8/7gy5O XHIVWzMI+UlBMwaguA3eTlfNdYqNOb8HyfG4fU2H2hVOO+fuaS5kxQPoXAJ+UQos9lVF /dnAIVOSzkHgPXyM3y2Tf6Y+u3IzAwkh+syEwx0gc/cRiqDWRsCZe8SpGyAe5aifzq5Y 0/Vu5qtfSHcm5qok/8Fh4uesQ1k88o64Si15dTt4C5xhxyhjrI9Fz4PM3zo7N6XuOXNJ 6q0XxIcSRNcHJt81/gfrpuYrktd06ZGnfWIloxL/7b/4v29vT5+50mOWz8kh6SnysMCt Nu3A== X-Received: by 10.194.134.135 with SMTP id pk7mr17493419wjb.111.1447350626430; Thu, 12 Nov 2015 09:50:26 -0800 (PST) MIME-Version: 1.0 Received: by 10.28.178.129 with HTTP; Thu, 12 Nov 2015 09:50:06 -0800 (PST) In-Reply-To: References: From: Nick Dimiduk Date: Thu, 12 Nov 2015 09:50:06 -0800 Message-ID: Subject: Re: Accumulators/Metrics To: user@flink.apache.org Cc: "Kreutzfeldt, Christian" Content-Type: multipart/alternative; boundary=089e01228d9ef436ad05245b930d --089e01228d9ef436ad05245b930d Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I'm much more interested in as-they-happening metrics than job completion summaries as these are stream processing jobs that should "never end". Ufuk's suggestion of a subtask-unique counter, combined with rate-of-change functions in a tool like InfluxDB will probably work for my needs. So too does managing my own dropwizard MetricRegistry. An observation: routing all online metrics through the heartbeat mechanism to a single host for display sounds like a scalability bottleneck. Doesn't this design limit the practical volume of metrics that can be exposed by the runtime and user applications? On Thu, Nov 12, 2015 at 6:12 AM, Ufuk Celebi wrote: > Hey Nick, > > you can do the following for per task stats (this is kind of an > workaround): > > Create an Accumulator with the subtask index in the name, e.g. > > int subtaskIndex =3D getRuntimeContext().getIndexOfThisSubtask(); > IntCounter counter =3D getRuntimeContext().getIntCounter("counter-" + > subtaskIndex); > > This way you have one accumulator per subtask. > > The web interface will display the values as they are set (I=E2=80=99m no= t sure if > it is in yet). You can also gather the stats from the execution result, e= .g. > ExecutionResult res =3D env.execute(); > res.getAllAccumulatorResults(); > > > You can furthermore add a custom Accumulator variant, which simple sets > one value if this is what you need. > > Does this help? > > In any case, I agree that it would be nice to expose a special > API/accumulator for this via the runtime context. > > =E2=80=93 Ufuk > > > On 12 Nov 2015, at 11:55, Maximilian Michels wrote: > > > > Hi Nick, > > > > I don't know if you have already come across the Rest Api. If not, > > please have a look here: > > > https://ci.apache.org/projects/flink/flink-docs-master/internals/monitori= ng_rest_api.html > > > > I know that Christian Kreutzfeldt (cc) has been working on a > > monitoring service which uses Akka messages to query the JobManager on > > a job's status and accumulators. I'm wondering if you two could engage > > in any way. > > > > Cheers, > > Max > > > > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk > wrote: > >> Hello, > >> > >> I'm interested in exposing metrics from my UDFs. I see FLINK-1501 > exposes > >> task manager metrics via a UI; it would be nice to plug into the same > >> MetricRegistry to register my own (ie, gauges). I don't see this > exposed via > >> runtime context. This did lead me to discovering the Accumulators API. > This > >> looks more oriented to simple counts, which are summed across > components of > >> a batch job. In my case, I'd like to expose details of my stream > processing > >> vertices so that I can monitor their correctness and health re: runtim= e > >> decisions. For instance, referring back to my previous thread, I would > like > >> to expose the number of filters loaded into my custom RichCoFlatMap so > that > >> I can easily monitor this value. > >> > >> Thanks, > >> Nick > > --089e01228d9ef436ad05245b930d Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
I'm much more interested in as-they-happening metrics = than job completion summaries as these are stream processing jobs that shou= ld "never end". Ufuk's suggestion of a subtask-unique counter= , combined with rate-of-change functions in a tool like InfluxDB will proba= bly work for my needs. So too does managing my own dropwizard MetricRegistr= y.

An observation: routing all online metrics through th= e heartbeat mechanism to a single host for display sounds like a scalabilit= y bottleneck. Doesn't this design limit the practical volume of metrics= that can be exposed by the runtime and user applications?

On Thu, Nov 12, 2015 a= t 6:12 AM, Ufuk Celebi <uce@apache.org> wrote:
Hey Nick,

you can do the following for per task stats (this is kind of an workaround)= :

Create an Accumulator with the subtask index in the name, e.g.

int subtaskIndex =3D getRuntimeContext().getIndexOfThisSubtask();
IntCounter counter =3D getRuntimeContext().getIntCounter("counter-&quo= t; + subtaskIndex);

This way you have one accumulator per subtask.

The web interface will display the values as they are set (I=E2=80=99m not = sure if it is in yet). You can also gather the stats from the execution res= ult, e.g.
ExecutionResult res =3D env.execute();
res.getAllAccumulatorResults();


You can furthermore add a custom Accumulator variant, which simple sets one= value if this is what you need.

Does this help?

In any case, I agree that it would be nice to expose a special API/accumula= tor for this via the runtime context.

=E2=80=93 Ufuk

> On 12 Nov 2015, at 11:55, Maximilian Michels <mxm@apache.org> wrote:
>
> Hi Nick,
>
> I don't know if you have already come across the Rest Api. If not,=
> please have a look here:
> https:= //ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_= api.html
>
> I know that Christian Kreutzfeldt (cc) has been working on a
> monitoring service which uses Akka messages to query the JobManager on=
> a job's status and accumulators. I'm wondering if you two coul= d engage
> in any way.
>
> Cheers,
> Max
>
> On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk <ndimiduk@gmail.com> wrote:
>> Hello,
>>
>> I'm interested in exposing metrics from my UDFs. I see FLINK-1= 501 exposes
>> task manager metrics via a UI; it would be nice to plug into the s= ame
>> MetricRegistry to register my own (ie, gauges). I don't see th= is exposed via
>> runtime context. This did lead me to discovering the Accumulators = API. This
>> looks more oriented to simple counts, which are summed across comp= onents of
>> a batch job. In my case, I'd like to expose details of my stre= am processing
>> vertices so that I can monitor their correctness and health re: ru= ntime
>> decisions. For instance, referring back to my previous thread, I w= ould like
>> to expose the number of filters loaded into my custom RichCoFlatMa= p so that
>> I can easily monitor this value.
>>
>> Thanks,
>> Nick


--089e01228d9ef436ad05245b930d--