flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Saputra <henry.sapu...@gmail.com>
Subject Re: Enhance Flink's monitoring capabilities
Date Fri, 12 Dec 2014 05:18:22 GMT
Thanks Robert, looks like we could use this JIRA to do the work

- Henry

On Thu, Dec 11, 2014 at 9:25 AM, Robert Metzger <rmetzger@apache.org> wrote:
> I think this (very old) issue is somewhat closely describing the feature:
> https://issues.apache.org/jira/browse/FLINK-456
>
>
>
> On Thu, Dec 11, 2014 at 8:32 AM, Henry Saputra <henry.saputra@gmail.com>
> wrote:
>
>> Just curious, is there any JIRA filed for this or was it just in
>> preliminary proposal talk?
>>
>> - Henry
>>
>> On Sun, Dec 7, 2014 at 3:36 PM, Stephan Ewen <sewen@apache.org> wrote:
>> > That actually sounds like a great idea. I discussed a bit with Robert
>> > offline on Friday, and it seems that Metrics has most of what we talked
>> > about.
>> >
>> > I also like the way they make it extensible, so people can capture their
>> > own metrics.
>> >
>> > On Sun, Dec 7, 2014 at 6:02 AM, Henry Saputra <henry.saputra@gmail.com>
>> > wrote:
>> >
>> >> Hi Robert,
>> >>
>> >> From I have seen it so far, it is probably better and easier for Flink
>> >> to leverage metrics library [1] for the metrics collection rather than
>> >> building organically.
>> >>
>> >> Several ASF projects like Spark [2] and Tajo have used it with great
>> >> success.
>> >>
>> >> One of the main reasons is maintainability and the breath of types of
>> >> metric could and should be collected.
>> >>
>> >> - Henry
>> >>
>> >> [1] https://dropwizard.github.io/metrics/3.1.0/getting-started/
>> >> [2] https://spark.apache.org/docs/1.0.1/monitoring.html
>> >> [3] https://issues.apache.org/jira/browse/TAJO-333
>> >>
>> >> On Sat, Dec 6, 2014 at 11:13 AM, Robert Metzger <rmetzger@apache.org>
>> >> wrote:
>> >> > Hey Nils,
>> >> >
>> >> > I have played around a bit with a little prototype. You can find the
>> code
>> >> > here: https://github.com/rmetzger/incubator-flink/tree/flink456 (its
>> >> > another branch in my repo).
>> >> > You can see the changes that I applied on top of Till's Akka branch
>> here:
>> >> >
>> >>
>> https://github.com/rmetzger/incubator-flink/compare/tillrohrmann:akka_scala...rmetzger:flink456?expand=1
>> >> >
>> >> > What the code does is collecting statistics about each TaskManager
in
>> the
>> >> > system. These stats are assembled into a "MetricsReport" which is send
>> >> with
>> >> > the periodical heartbeat to the JobManager. The JobManager stores the
>> >> > latest MetricsReport for each TaskManager (in the Instance object for
>> >> each
>> >> > TM).
>> >> > When the user accesses the TaskManager overview, the latest
>> MetricsReport
>> >> > is send as a JSONObject to the browser.
>> >> >
>> >> > to test my changes, check out the code, build it
>> >> >  mvn clean package -DskipTests -Dcheckstyle.skip=true
>> >> > go into
>> >> > cd
>> >> >
>> >>
>> flink-dist/target/flink-0.8-incubating-SNAPSHOT-bin/flink-0.8-incubating-SNAPSHOT/
>> >> > and start the web interface
>> >> > /bin/start-local.sh
>> >> >
>> >> > Go to localhost:8081, in the "TaskManager" view, you can see some
>> >> metrics.
>> >> > Here is a screenshot: http://img42.com/eNPve
>> >> >
>> >> > I named my branch after this issue, as it is probably describing best
>> >> what
>> >> > we're working on here: FLINK-456
>> >> > <https://issues.apache.org/jira/browse/FLINK-456>
>> >> >
>> >> > As I said in the beginning, its really just a prototype. Let me know
>> if
>> >> you
>> >> > have any further questions.
>> >> > For the "per TaskManager" reports, we should probably integrate some
>> more
>> >> > statistics. Also, the presentation of the numbers is very very basic
>> >> right
>> >> > now. I think there are many good libraries for visualizing these
>> kinds of
>> >> > stats.
>> >> > Also, the numbers currently represent only a "snapshot", however,
>> some of
>> >> > the numbers can be accumulated (read/write bytes of the io manager).
>> >> > Another missing feature is storing a little history of numbers to
>> >> visualize
>> >> > metrics over time.
>> >> >
>> >> > I'm trying to find time to look into "per job" metrics as well. They
>> will
>> >> > require a bit more infrastructure to distinguish them on the
>> JobManager
>> >> > side and to get them on the TaskManagers.
>> >> >
>> >> >
>> >> > Best,
>> >> > Robert
>> >> >
>> >> >
>> >> >
>> >> > On Tue, Dec 2, 2014 at 2:53 PM, aalexandrov <
>> >> > alexander.s.alexandrov@gmail.com> wrote:
>> >> >
>> >> >> Hello Nils,
>> >> >>
>> >> >> I am going to work on a similar issue related to tracking some
basics
>> >> >> statistics of the intermediate results produced by dataflows during
>> >> >> execution.
>> >> >>
>> >> >> I just create a Jira issue here:
>> >> >>
>> >> >> https://issues.apache.org/jira/browse/FLINK-1297
>> >> >>
>> >> >> If you already have some work done on extending the monitoring
>> >> capabilities
>> >> >> in a branch, it might be good to sync-up the development in order
to
>> >> avoid
>> >> >> duplicated work (e.g. using the same communication channel used
to
>> send
>> >> the
>> >> >> data from the task managers to the job manager).
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> View this message in context:
>> >> >>
>> >>
>> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/Enhance-Flink-s-monitoring-capabilities-tp2573p2713.html
>> >> >> Sent from the Apache Flink (Incubator) Mailing List archive. mailing
>> >> list
>> >> >> archive at Nabble.com.
>> >> >>
>> >>
>>

Mime
View raw message