flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dongwon Kim (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (FLINK-1502) Expose metrics to graphite, ganglia and JMX.
Date Tue, 23 Feb 2016 00:42:18 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15158023#comment-15158023
] 

Dongwon Kim edited comment on FLINK-1502 at 2/23/16 12:41 AM:
--------------------------------------------------------------

Let's consider the following scenario:
||Sessions||Node1||Node2||Node3||
|Session1|TM1|TM2|TM3|
|Session2|TM2|TM3|TM1|
|Session3|TM3|TM2|TM1|

After Session1 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 

After Session2 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time

After Session3 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time
- cluster.MyCluster.taskmanager.3.gc_time
Around this time, a user should check which metric is the one for the current session among
the above three metrics.
The problem is getting worse if the user has to launch much more TaskManagers.
For example, 500 TaskManagers over multiple sessions will end up with 500 metrics for each
host.

Wouldn't be better to assign indexes to TaskManagers scoped to each host?

p.s.
I'm going to start without considering multiple TaskManagers on the same node as we haven't
yet reached a consensus.
But I think we still need to develop this discussion further.


was (Author: eastcirclek):
Let's consider the following scenario:

                 |  Node1 (N1)  |   N2   |   N3     
--------------------------------------
Session1   |       TM1        |  TM2  |  TM3
Session2   |       TM2       |  TM3  |  TM1
Session3   |       TM3       |  TM2  |  TM1

After Session1 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 

After Session2 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time

After Session3 is finished, Node1 have the following metrics:
- cluster.MyCluster.taskmanager.1.gc_time 
- cluster.MyCluster.taskmanager.2.gc_time
- cluster.MyCluster.taskmanager.3.gc_time
Around this time, a user should check which metric is the one for the current session among
the above three metrics.
The problem is getting worse if the user has to launch much more TaskManagers.
For example, 500 TaskManagers over multiple sessions will end up with 500 metrics for each
host.

Wouldn't be better to assign indexes to TaskManagers scoped to each host?

p.s.
I'm going to start without considering multiple TaskManagers on the same node as we haven't
yet reached a consensus.
But I think we still need to develop this discussion further.

> Expose metrics to graphite, ganglia and JMX.
> --------------------------------------------
>
>                 Key: FLINK-1502
>                 URL: https://issues.apache.org/jira/browse/FLINK-1502
>             Project: Flink
>          Issue Type: Sub-task
>          Components: JobManager, TaskManager
>    Affects Versions: 0.9
>            Reporter: Robert Metzger
>            Assignee: Dongwon Kim
>            Priority: Minor
>             Fix For: pre-apache
>
>
> The metrics library allows to expose collected metrics easily to other systems such as
graphite, ganglia or Java's JVM (VisualVM).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message