Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: <B338A888-43AA-464C-BCAB-85DF7C79DDB0@apache.org>
References: 
 <CANZa=GvaTQ2O7u3dCb+97ridnsadf_SK9NrBipQ6SOm33tMFOg@mail.gmail.com>
 <CAGco--Z74OAccb3xXcy8LbfFb-x8rMeKFGB2_eiMT1uyfarZbQ@mail.gmail.com>
 <B338A888-43AA-464C-BCAB-85DF7C79DDB0@apache.org>
From: Nick Dimiduk <ndimiduk@gmail.com>
Date: Thu, 12 Nov 2015 09:50:06 -0800
Message-ID: 
 <CANZa=GvswavyZ4XujnFZrgd2gu5xbQazdDbzHtw3Nfy37t16Yw@mail.gmail.com>
Subject: Re: Accumulators/Metrics
To: user@flink.apache.org
Cc: "Kreutzfeldt, Christian" <Christian.Kreutzfeldt@ottogroup.com>
Content-Type: multipart/alternative; boundary=089e01228d9ef436ad05245b930d

--089e01228d9ef436ad05245b930d
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

I'm much more interested in as-they-happening metrics than job completion
summaries as these are stream processing jobs that should "never end".
Ufuk's suggestion of a subtask-unique counter, combined with rate-of-change
functions in a tool like InfluxDB will probably work for my needs. So too
does managing my own dropwizard MetricRegistry.

An observation: routing all online metrics through the heartbeat mechanism
to a single host for display sounds like a scalability bottleneck. Doesn't
this design limit the practical volume of metrics that can be exposed by
the runtime and user applications?

On Thu, Nov 12, 2015 at 6:12 AM, Ufuk Celebi <uce@apache.org> wrote:

> Hey Nick,
>
> you can do the following for per task stats (this is kind of an
> workaround):
>
> Create an Accumulator with the subtask index in the name, e.g.
>
> int subtaskIndex =3D getRuntimeContext().getIndexOfThisSubtask();
> IntCounter counter =3D getRuntimeContext().getIntCounter("counter-" +
> subtaskIndex);
>
> This way you have one accumulator per subtask.
>
> The web interface will display the values as they are set (I=E2=80=99m no=
t sure if
> it is in yet). You can also gather the stats from the execution result, e=
.g.
> ExecutionResult res =3D env.execute();
> res.getAllAccumulatorResults();
>
>
> You can furthermore add a custom Accumulator variant, which simple sets
> one value if this is what you need.
>
> Does this help?
>
> In any case, I agree that it would be nice to expose a special
> API/accumulator for this via the runtime context.
>
> =E2=80=93 Ufuk
>
> > On 12 Nov 2015, at 11:55, Maximilian Michels <mxm@apache.org> wrote:
> >
> > Hi Nick,
> >
> > I don't know if you have already come across the Rest Api. If not,
> > please have a look here:
> >
> https://ci.apache.org/projects/flink/flink-docs-master/internals/monitori=
ng_rest_api.html
> >
> > I know that Christian Kreutzfeldt (cc) has been working on a
> > monitoring service which uses Akka messages to query the JobManager on
> > a job's status and accumulators. I'm wondering if you two could engage
> > in any way.
> >
> > Cheers,
> > Max
> >
> > On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk <ndimiduk@gmail.com>
> wrote:
> >> Hello,
> >>
> >> I'm interested in exposing metrics from my UDFs. I see FLINK-1501
> exposes
> >> task manager metrics via a UI; it would be nice to plug into the same
> >> MetricRegistry to register my own (ie, gauges). I don't see this
> exposed via
> >> runtime context. This did lead me to discovering the Accumulators API.
> This
> >> looks more oriented to simple counts, which are summed across
> components of
> >> a batch job. In my case, I'd like to expose details of my stream
> processing
> >> vertices so that I can monitor their correctness and health re: runtim=
e
> >> decisions. For instance, referring back to my previous thread, I would
> like
> >> to expose the number of filters loaded into my custom RichCoFlatMap so
> that
> >> I can easily monitor this value.
> >>
> >> Thanks,
> >> Nick
>
>

--089e01228d9ef436ad05245b930d
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">I&#39;m much more interested in as-they-happening metrics =
than job completion summaries as these are stream processing jobs that shou=
ld &quot;never end&quot;. Ufuk&#39;s suggestion of a subtask-unique counter=
, combined with rate-of-change functions in a tool like InfluxDB will proba=
bly work for my needs. So too does managing my own dropwizard MetricRegistr=
y.<div><br></div><div>An observation: routing all online metrics through th=
e heartbeat mechanism to a single host for display sounds like a scalabilit=
y bottleneck. Doesn&#39;t this design limit the practical volume of metrics=
 that can be exposed by the runtime and user applications?</div></div><div =
class=3D"gmail_extra"><br><div class=3D"gmail_quote">On Thu, Nov 12, 2015 a=
t 6:12 AM, Ufuk Celebi <span dir=3D"ltr">&lt;<a href=3D"mailto:uce@apache.o=
rg" target=3D"_blank">uce@apache.org</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">Hey Nick,<br>
<br>
you can do the following for per task stats (this is kind of an workaround)=
:<br>
<br>
Create an Accumulator with the subtask index in the name, e.g.<br>
<br>
int subtaskIndex =3D getRuntimeContext().getIndexOfThisSubtask();<br>
IntCounter counter =3D getRuntimeContext().getIntCounter(&quot;counter-&quo=
t; + subtaskIndex);<br>
<br>
This way you have one accumulator per subtask.<br>
<br>
The web interface will display the values as they are set (I=E2=80=99m not =
sure if it is in yet). You can also gather the stats from the execution res=
ult, e.g.<br>
ExecutionResult res =3D env.execute();<br>
res.getAllAccumulatorResults();<br>
<br>
<br>
You can furthermore add a custom Accumulator variant, which simple sets one=
 value if this is what you need.<br>
<br>
Does this help?<br>
<br>
In any case, I agree that it would be nice to expose a special API/accumula=
tor for this via the runtime context.<br>
<span class=3D"HOEnZb"><font color=3D"#888888"><br>
=E2=80=93 Ufuk<br>
</font></span><div class=3D"HOEnZb"><div class=3D"h5"><br>
&gt; On 12 Nov 2015, at 11:55, Maximilian Michels &lt;<a href=3D"mailto:mxm=
@apache.org">mxm@apache.org</a>&gt; wrote:<br>
&gt;<br>
&gt; Hi Nick,<br>
&gt;<br>
&gt; I don&#39;t know if you have already come across the Rest Api. If not,=
<br>
&gt; please have a look here:<br>
&gt; <a href=3D"https://ci.apache.org/projects/flink/flink-docs-master/inte=
rnals/monitoring_rest_api.html" rel=3D"noreferrer" target=3D"_blank">https:=
//ci.apache.org/projects/flink/flink-docs-master/internals/monitoring_rest_=
api.html</a><br>
&gt;<br>
&gt; I know that Christian Kreutzfeldt (cc) has been working on a<br>
&gt; monitoring service which uses Akka messages to query the JobManager on=
<br>
&gt; a job&#39;s status and accumulators. I&#39;m wondering if you two coul=
d engage<br>
&gt; in any way.<br>
&gt;<br>
&gt; Cheers,<br>
&gt; Max<br>
&gt;<br>
&gt; On Wed, Nov 11, 2015 at 6:44 PM, Nick Dimiduk &lt;<a href=3D"mailto:nd=
imiduk@gmail.com">ndimiduk@gmail.com</a>&gt; wrote:<br>
&gt;&gt; Hello,<br>
&gt;&gt;<br>
&gt;&gt; I&#39;m interested in exposing metrics from my UDFs. I see FLINK-1=
501 exposes<br>
&gt;&gt; task manager metrics via a UI; it would be nice to plug into the s=
ame<br>
&gt;&gt; MetricRegistry to register my own (ie, gauges). I don&#39;t see th=
is exposed via<br>
&gt;&gt; runtime context. This did lead me to discovering the Accumulators =
API. This<br>
&gt;&gt; looks more oriented to simple counts, which are summed across comp=
onents of<br>
&gt;&gt; a batch job. In my case, I&#39;d like to expose details of my stre=
am processing<br>
&gt;&gt; vertices so that I can monitor their correctness and health re: ru=
ntime<br>
&gt;&gt; decisions. For instance, referring back to my previous thread, I w=
ould like<br>
&gt;&gt; to expose the number of filters loaded into my custom RichCoFlatMa=
p so that<br>
&gt;&gt; I can easily monitor this value.<br>
&gt;&gt;<br>
&gt;&gt; Thanks,<br>
&gt;&gt; Nick<br>
<br>
</div></div></blockquote></div><br></div>

--089e01228d9ef436ad05245b930d--