flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Till Rohrmann <trohrm...@apache.org>
Subject Re: Adding custom monitoring to Flink
Date Tue, 19 Apr 2016 10:30:04 GMT
Hi Maxim,

I think the corresponding JIRA issue is
https://issues.apache.org/jira/browse/FLINK-456

Cheers,
Till

On Thu, Apr 14, 2016 at 10:50 PM, Maxim <mfateev@gmail.com> wrote:

> I don't have full list of metrics, but everything that is related to
> runtime performance and possible bottlenecks of the system. All
> interprocess communication counters, errors, latencies, checkpoint sizes
> and checkpointing latencies. Buffer allocations and releases, etc.
> As we aggregate ourselves we can produce multiple views of the same metric:
> min, max, tp99, tp99.9, top n, etc.
>
> Could you point to the doc/Jira/diff for your change?
>
>
> On Thu, Apr 14, 2016 at 12:32 PM, Chesnay Schepler <chesnay@apache.org>
> wrote:
>
> > I'm currently working on a metric system that
> > a) exposes several TaskManger metrics
> > b) allows gathering metrics in various parts of a task, most notably
> > user-defined functions.
> >
> > The first version makes these metrics available via JMX on each
> > TaskManager.
> > While a mechanism to make that pluggable is /planned/ there are no
> details
> > on that yet.
> >
> > I /guess/ once it is merged you should be able to modify one of the
> > classes so that the data is directly
> > exported to your tool, but i would have to know more about it to make a
> > definite assessment.
> >
> > There are no plans to funnel all those metrics unaggregated through
> > Flink's accumulator mechanism;
> > only a selection that will be aggregated locally and on the JobManager to
> > display in the Dashboard.
> >
> > Out of curiosity, what metrics are you interested in?
> >
> >
> > On 14.04.2016 20:59, Maxim wrote:
> >
> >> Hi!
> >> I'm looking into integrating Flink into our stack and one of the
> >> requirements is to report metrics to an internal system. The current
> >> Accumulators are not adequate to provide visibility that we need to run
> >> such a system in production. We want much more information about the
> >> internal cluster state and ability to calculate aggregates ourselves.
> The
> >> core reporting API accepts a metric name, metric type (gauge, counter,
> >> timer) and a set of key value pairs that act as dimensions.
> >>
> >> The ideal solution for us would report the metrics through such API and
> >> provide default binding to existing Accumulators, but allow overriding
> it
> >> to our internal reporting client.
> >>
> >> Is it something that could be added to the Flink or there are other
> plans
> >> for monitoring?
> >>
> >> Thanks!
> >>
> >> Maxim.
> >>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message