flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Márton Balassi <balassi.mar...@gmail.com>
Subject Re: Statistics collection for optimization
Date Tue, 02 Dec 2014 21:28:48 GMT
It would be nice to have integration with the existing tools, e.g. Ganglia.
[1] These already cover system statistics, (CPU, network, I/O...) and one
can define own stats to monitor.
Hadoop is nicely integrated with it.

[1] http://ganglia.sourceforge.net/

On Tue, Dec 2, 2014 at 9:37 PM, Fabian Hueske <fhueske@apache.org> wrote:

> I see mainly two use cases to locally collect data on TMs and send it (and
> aggregate it) on the JM.
>
> 1) Monitoring of the system and running jobs: This might include system
> stats (CPU, disk usage, network traffic & buffer usage, internal memory
> utilization, ...) but also progress information (number of processed
> elements, histogram of UDF in/out ratio, UDF exec times, etc.).
> 2) Statistics collection for optimization: Stats would include key counts &
> distributions, record count & sizes, UDF stats (in/out ratio, exec times,
> ...). Depending on the expertise of the user, this information could also
> be valuable monitoring information.
>
> In both cases, we need a service to ship collected data from the TMs to the
> JM and aggregated and store it there.
> Once this service is in place, the collection of metrics could be
> independently implemented.
>
> 2014-12-02 14:57 GMT+01:00 Alexander Alexandrov <
> alexander.s.alexandrov@gmail.com>:
>
> > This is another way to do it.
> >
> > I just created a JIRA issue for that:
> >
> > https://issues.apache.org/jira/browse/FLINK-1297
> >
> > If you can give me some pointers and suggest implementation strategies I
> > can try to prototype something in a feature branch over the weekend and
> > share it for review.
> >
> >
> >
> > 2014-12-02 14:43 GMT+01:00 Ufuk Celebi <uce@apache.org>:
> >
> > > Have you also thought about adding the statistics collection with the
> > > writers, i.e. the collector or record writer?
> > >
> > > If all you care about is the data that the user emits from her code,
> that
> > > should be fine.
> > >
> > > On Tue, Dec 2, 2014 at 2:33 PM, Robert Metzger <rmetzger@apache.org>
> > > wrote:
> > >
> > > > Yes. I also got the impression that you are looking for something
> > > slightly
> > > > different.
> > > >
> > > > It is probably easier for you right now to "hack" something into the
> > > system
> > > > to get these statistics.
> > > >
> > > > On Tue, Dec 2, 2014 at 2:25 PM, Alexander Alexandrov <
> > > > alexander.s.alexandrov@gmail.com> wrote:
> > > >
> > > > > I checked the thread. I am not sure whether this is aligned with
> > what I
> > > > > want to contribute.
> > > > >
> > > > > The discussion in the other thread seems to be going in the
> direction
> > > of
> > > > > general-purpose monitoring (you are talking about Disk + Network
> IO,
> > > > input
> > > > > splits).
> > > > >
> > > > > I would like to have a very thin code base that can be (1)
> > > transparently
> > > > > injected in UDFs (if you can manipulate the AST), or wrapped in
> > > identity
> > > > > mappers (if you cannot) in order to gather collection statistics
> > (min,
> > > > max,
> > > > > distinct, maybe some histograms) to facilitate incremental
> > > optimization.
> > > > >
> > > > > I agree that this should be based on existing infrastructure (Akka)
> > and
> > > > > should not be over over-engineered.
> > > > >
> > > > > I will announce this in the other branch and create a JIRA ticket
> to
> > > fix
> > > > > the parameters of what has to be done and the best way to implement
> > it
> > > > with
> > > > > the other contributors.
> > > > >
> > > > >
> > > > >
> > > > > 2014-12-02 14:12 GMT+01:00 Kostas Tzoumas <ktzoumas@apache.org>:
> > > > >
> > > > > > From the status of that thread and absence of a JIRA (as far
as I
> > > could
> > > > > > tell), I would suggest that you start working on this and
> announce
> > it
> > > > on
> > > > > > the other thread, perhaps Nils would be interested in jumping
in.
> > > > > >
> > > > > > On Tue, Dec 2, 2014 at 2:06 PM, Ufuk Celebi <uce@apache.org>
> > wrote:
> > > > > >
> > > > > > > Very nice to hear :)
> > > > > > >
> > > > > > > See this thread:
> > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-td2573.html
> > > > > > >
> > > > > > > On Tue, Dec 2, 2014 at 2:00 PM, Alexander Alexandrov <
> > > > > > > alexander.s.alexandrov@gmail.com> wrote:
> > > > > > >
> > > > > > > > Just a quick shout to check whether somebody is already
> working
> > > on
> > > > a
> > > > > > > > statistics collection component?
> > > > > > > >
> > > > > > > > If yes, can you point me to previous discussions in
the
> mailing
> > > > list
> > > > > > and
> > > > > > > a
> > > > > > > > WIP branch -- I want to bring myself up to date with
the
> > ongoing
> > > > > > efforts.
> > > > > > > >
> > > > > > > > If not, I would like to start working on that component
and
> > > ideally
> > > > > > > > integrate some parts of it in the 0.8 release.
> > > > > > > >
> > > > > > > > Cheers!
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message