flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-964) Integrate profiling code with web interface
Date Tue, 26 Aug 2014 10:11:59 GMT

    [ https://issues.apache.org/jira/browse/FLINK-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14110574#comment-14110574

Stephan Ewen commented on FLINK-964:

Very cool first prototype, I like it!

I am posting a quick summary of the status and the other ideas that have been floating around
in the context of the job profiling:

 - There is quite a bit of profiling data gathered, but I think some stuff is also a bit out
of date (for example the gate profiling does not work and make sense any more because the
internal models changed)

 - We are currently thinking to gather data stats (byte and record counts) from the operators
as well. This could go well together with the profiling. It would be good if the profiling
code was generic in the sense that it allows to transfer arbitrary time series of metrics.
It makes sense to define scopes for these metrics, such as for example "global (cluster profiling)",
"singe machine (machine profiling)", "operator", so these metrics would be displayed in the
web frontend in the respective section.

 - The memory profiling is a bit senseless right now, because the JVMs are always of the roughly
same memory size, once ramped up. Instead, I would add the "managed memory" of Flink.

 - I think a lot of the machine profiling code (cpu utilization, network throughput) works
currently only on Linux. 

As a side note: I think it makes sense to integrate the currently separate profiling code
communication (RPC) with the regular coordination RPCs. That is transparent (probably 50 lines)
change once we have Till's changes merged, which bases the distributed coordination on Akka.

> Integrate profiling code with web interface
> -------------------------------------------
>                 Key: FLINK-964
>                 URL: https://issues.apache.org/jira/browse/FLINK-964
>             Project: Flink
>          Issue Type: Improvement
>          Components: Local Runtime, Webfrontend
>    Affects Versions: 0.6-incubating
>            Reporter: Stephan Ewen
>            Assignee: Jonathan Hasenburg
> This issue is subject to discussion.
> The profiling code currently needs to be kept in sync with the job graph code, execution
graph code, and runtime code.
> Since that part of the code is undergoing quite some changes and the profiling code is
not used right now, I suggest to remove it, or move it to an artifact repository.

This message was sent by Atlassian JIRA

View raw message