flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabian Hueske <fhue...@apache.org>
Subject Re: Statistics collection for optimization
Date Tue, 02 Dec 2014 20:37:24 GMT
I see mainly two use cases to locally collect data on TMs and send it (and
aggregate it) on the JM.

1) Monitoring of the system and running jobs: This might include system
stats (CPU, disk usage, network traffic & buffer usage, internal memory
utilization, ...) but also progress information (number of processed
elements, histogram of UDF in/out ratio, UDF exec times, etc.).
2) Statistics collection for optimization: Stats would include key counts &
distributions, record count & sizes, UDF stats (in/out ratio, exec times,
...). Depending on the expertise of the user, this information could also
be valuable monitoring information.

In both cases, we need a service to ship collected data from the TMs to the
JM and aggregated and store it there.
Once this service is in place, the collection of metrics could be
independently implemented.

2014-12-02 14:57 GMT+01:00 Alexander Alexandrov <
alexander.s.alexandrov@gmail.com>:

> This is another way to do it.
>
> I just created a JIRA issue for that:
>
> https://issues.apache.org/jira/browse/FLINK-1297
>
> If you can give me some pointers and suggest implementation strategies I
> can try to prototype something in a feature branch over the weekend and
> share it for review.
>
>
>
> 2014-12-02 14:43 GMT+01:00 Ufuk Celebi <uce@apache.org>:
>
> > Have you also thought about adding the statistics collection with the
> > writers, i.e. the collector or record writer?
> >
> > If all you care about is the data that the user emits from her code, that
> > should be fine.
> >
> > On Tue, Dec 2, 2014 at 2:33 PM, Robert Metzger <rmetzger@apache.org>
> > wrote:
> >
> > > Yes. I also got the impression that you are looking for something
> > slightly
> > > different.
> > >
> > > It is probably easier for you right now to "hack" something into the
> > system
> > > to get these statistics.
> > >
> > > On Tue, Dec 2, 2014 at 2:25 PM, Alexander Alexandrov <
> > > alexander.s.alexandrov@gmail.com> wrote:
> > >
> > > > I checked the thread. I am not sure whether this is aligned with
> what I
> > > > want to contribute.
> > > >
> > > > The discussion in the other thread seems to be going in the direction
> > of
> > > > general-purpose monitoring (you are talking about Disk + Network IO,
> > > input
> > > > splits).
> > > >
> > > > I would like to have a very thin code base that can be (1)
> > transparently
> > > > injected in UDFs (if you can manipulate the AST), or wrapped in
> > identity
> > > > mappers (if you cannot) in order to gather collection statistics
> (min,
> > > max,
> > > > distinct, maybe some histograms) to facilitate incremental
> > optimization.
> > > >
> > > > I agree that this should be based on existing infrastructure (Akka)
> and
> > > > should not be over over-engineered.
> > > >
> > > > I will announce this in the other branch and create a JIRA ticket to
> > fix
> > > > the parameters of what has to be done and the best way to implement
> it
> > > with
> > > > the other contributors.
> > > >
> > > >
> > > >
> > > > 2014-12-02 14:12 GMT+01:00 Kostas Tzoumas <ktzoumas@apache.org>:
> > > >
> > > > > From the status of that thread and absence of a JIRA (as far as I
> > could
> > > > > tell), I would suggest that you start working on this and announce
> it
> > > on
> > > > > the other thread, perhaps Nils would be interested in jumping in.
> > > > >
> > > > > On Tue, Dec 2, 2014 at 2:06 PM, Ufuk Celebi <uce@apache.org>
> wrote:
> > > > >
> > > > > > Very nice to hear :)
> > > > > >
> > > > > > See this thread:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-td2573.html
> > > > > >
> > > > > > On Tue, Dec 2, 2014 at 2:00 PM, Alexander Alexandrov <
> > > > > > alexander.s.alexandrov@gmail.com> wrote:
> > > > > >
> > > > > > > Just a quick shout to check whether somebody is already
working
> > on
> > > a
> > > > > > > statistics collection component?
> > > > > > >
> > > > > > > If yes, can you point me to previous discussions in the
mailing
> > > list
> > > > > and
> > > > > > a
> > > > > > > WIP branch -- I want to bring myself up to date with the
> ongoing
> > > > > efforts.
> > > > > > >
> > > > > > > If not, I would like to start working on that component
and
> > ideally
> > > > > > > integrate some parts of it in the 0.8 release.
> > > > > > >
> > > > > > > Cheers!
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message