flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongwon Kim (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.
Date Thu, 18 Feb 2016 02:30:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15151346#comment-15151346
] 

Dongwon Kim edited comment on FLINK-1502 at 2/18/16 2:30 AM:
-------------------------------------------------------------

To [~StephanEwen], [~mxm], [~jgrier], 
First of all, sorry for the late response.

We just need to make each TaskManager report its metrics to JMX/Ganglia/Graphite as you guys
suggested.

To [~mxm], 
The problem mainly comes from such a design is that a newly launched TaskManager is given
a randomly generated UUID and it will create too many Ganglia metrics as [~jgrier] mentioned
above.
I think [~jgrier]'s solution is quite simple yet viable:

cluster.<CLUSTER_NAME>.taskmanager.1.gc_time
cluster.<CLUSTER_NAME>.taskmanager.2.gc_time

To that end, we need to open a new issue to assign such IDs to TaskManagers running on the
same host.
One concern is that we need to do such numbering even when only one TaskManager is running
on each node like <CLUSTER_NAME>.taskmanager.1.gc_time.
I'm okay with it but users could think that the numbering is quite ugly.

How do you guys think?


was (Author: eastcirclek):
To [~StephanEwen], [~mxm], [~jgrier], 
First of all, sorry for the late response.

We just need to make each TaskManager report its metrics to JMX/Ganglia/Graphite as you guys
suggested.

To [~mxm], 
The problem mainly comes from such a design is that a newly launched TaskManager is given
a randomly generated UUID and it will create too many Ganglia metrics as [~jgrier] mentioned
above.
I think [~jgrier]'s solution is quite simple yet viable:

cluster.<CLUSTER_NAME>.taskmanager.1.gc_time
cluster.<CLUSTER_NAME>.taskmanager.2.gc_time

To that end, we need to open a new issue to assign such IDs to TaskManagers running on the
same host.
One concern is that, despite only one TaskManager running each node, we need to do such numbering
(e.g. <CLUSTER_NAME>.taskmanager.1.gc_time).
I'm okay with it but users could think that the numbering is quite ugly.

How do you guys think?

> Expose metrics to graphite, ganglia and JMX.
> --------------------------------------------
>
>                 Key: FLINK-1502
>                 URL: https://issues.apache.org/jira/browse/FLINK-1502
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Dongwon Kim
>            Priority: Minor
>             Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other systems such as
graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message