flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Enhance Flink's monitoring capabilities
Date Sun, 07 Dec 2014 05:02:17 GMT
Hi Robert,

>From I have seen it so far, it is probably better and easier for Flink
to leverage metrics library [1] for the metrics collection rather than
building organically.

Several ASF projects like Spark [2] and Tajo have used it with great success.

One of the main reasons is maintainability and the breath of types of
metric could and should be collected.

- Henry

[1] https://dropwizard.github.io/metrics/3.1.0/getting-started/
[2] https://spark.apache.org/docs/1.0.1/monitoring.html
[3] https://issues.apache.org/jira/browse/TAJO-333

On Sat, Dec 6, 2014 at 11:13 AM, Robert Metzger <rmetzger@apache.org> wrote:
> Hey Nils,
> I have played around a bit with a little prototype. You can find the code
> here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its
> another branch in my repo).
> You can see the changes that I applied on top of Till's Akka branch here:
> https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1
> What the code does is collecting statistics about each TaskManager in the
> system. These stats are assembled into a "MetricsReport" which is send with
> the periodical heartbeat to the JobManager. The JobManager stores the
> latest MetricsReport for each TaskManager (in the Instance object for each
> TM).
> When the user accesses the TaskManager overview, the latest MetricsReport
> is send as a JSONObject to the browser.
> to test my changes, check out the code, build it
>  mvn clean package -DskipTests -Dcheckstyle.skip=true
> go into
> cd
> flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/
> and start the web interface
> /bin/start-local.sh
> Go to localhost:8081, in the "TaskManager" view, you can see some metrics.
> Here is a screenshot: http://img42.com/eNPve
> I named my branch after this issue, as it is probably describing best what
> we're working on here: FLINK-456
> <https://issues.apache.org/jira/browse/FLINK-456>
> As I said in the beginning, its really just a prototype. Let me know if you
> have any further questions.
> For the "per TaskManager" reports, we should probably integrate some more
> statistics. Also, the presentation of the numbers is very very basic right
> now. I think there are many good libraries for visualizing these kinds of
> stats.
> Also, the numbers currently represent only a "snapshot", however, some of
> the numbers can be accumulated (read/write bytes of the io manager).
> Another missing feature is storing a little history of numbers to visualize
> metrics over time.
> I'm trying to find time to look into "per job" metrics as well. They will
> require a bit more infrastructure to distinguish them on the JobManager
> side and to get them on the TaskManagers.
> Best,
> Robert
> On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov <
> alexander.s.alexandrov@gmail.com> wrote:
>> Hello Nils,
>> I am going to work on a similar issue related to tracking some basics
>> statistics of the intermediate results produced by dataflows during
>> execution.
>> I just create a Jira issue here:
>> https://issues.apache.org/jira/browse/FLINK-1297
>> If you already have some work done on extending the monitoring capabilities
>> in a branch, it might be good to sync-up the development in order to avoid
>> duplicated work (e.g. using the same communication channel used to send the
>> data from the task managers to the job manager).
>> --
>> View this message in context:
>> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html
>> Sent from the Apache Flink (Incubator) Mailing List archive. mailing list
>> archive at Nabble.com.

View raw message