flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Dimiduk <ndimi...@apache.org>
Subject Re: Accumulators/Metrics
Date Mon, 14 Dec 2015 20:54:43 GMT
Hi Christian,

I've returned to this project and am interested in exploring options. Have
you released any of your work yet? Have you considered an implementation
where each flink worker exposes it's own metrics via a "well known
interface" -- such as HTTP or JMX -- and letting an external process push
those metrics to a central data store? This is the architecture pursued by
OpenTSDB and Sensu.

Thanks,
Nick

On Thu, Nov 12, 2015 at 1:01 PM, mnxfst <mnxfst@gmail.com> wrote:

> Hi Nick,
>
> as Max mentioned in an earlier post on this topic, I started to work on a
> service to collect metrics from running stream processing jobs. We want to
> have all our metrics in one place whatever application (type) they come
> from.
>
> To integrate that behavior, I started to look at the accumulator API and
> learned from Max that all these information are collected for each task and
> get forwarded to the job manager. The job manager in turn provides a
> network
> exposed interface to interact with it (see
> org.apache.flink.runtime.messages.JobManagerMessages for more) using akka.
>
> What I did was to request for all running jobs, fetching more detailed
> information for each of them. You receive the accumulator values previously
> set.
>
> As the API currently provides only simple value counters, a basic average
> accumulator and a histogram (I have not worked with yet), I started to
> extend this to allow the use of metrics somehow similar to gauges, meters,
> timers, histograms and counters as defined by the dropwizard metrics
> framework.
>
> Unfortunately, an integration with the framework seems to be a more
> wild-hack oriented task. Therefore I decided to try out a smarter approach
> which even makes it simple on the flink framework side.
>
> If you know the graphite application, you will know that it receives a
> metric identifier, the current value and a timestamp as input. Everything
> else is handled either by graphite or a switched in statsd.
>
> To reduce any dependency from such external tools, I am actually working on
> a basic metric implementation which provides those metrics mentioned above.
> These are aggregated by the collector and may be forwarded towards any
> metrics system, eg. graphite.
>
> The overall idea is to keep things very simple as it may lead to heavy
> network traffic if too complex metrics types are provided on job side and
> must be transferred over the network. Keep it simple and do the aggregation
> on collector side.
>
> Your objection regarding the network traffic towards the job manager is
> valid and important. I haven't really thought about that so far, but maybe
> a
> more distributed approach must be found to avoid a bottleneck situation
> here.
>
> If you are interested in the solution that will be used throughout the jobs
> running in our environment, I hope this will be released as open source
> anytime soon since the Otto Group believes in open source ;-) If you would
> like to know more about it, feel free to ask ;-)
>
> Best
>   Christian (Kreutzfeldt)
>
>
> Nick Dimiduk wrote
> > I'm much more interested in as-they-happening metrics than job completion
> > summaries as these are stream processing jobs that should "never end".
> > Ufuk's suggestion of a subtask-unique counter, combined with
> > rate-of-change
> > functions in a tool like InfluxDB will probably work for my needs. So too
> > does managing my own dropwizard MetricRegistry.
> >
> > An observation: routing all online metrics through the heartbeat
> mechanism
> > to a single host for display sounds like a scalability bottleneck.
> Doesn't
> > this design limit the practical volume of metrics that can be exposed by
> > the runtime and user applications?
>
>
>
>
>
> --
> View this message in context:
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Accumulators-Metrics-tp3447p3459.html
> Sent from the Apache Flink User Mailing List archive. mailing list archive
> at Nabble.com.
>

Mime
View raw message