hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5751) Support UNIT for TimelineMetric
Date Wed, 19 Oct 2016 17:09:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5751?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588799#comment-15588799
] 

Varun Saxena edited comment on YARN-5751 at 10/19/16 5:09 PM:
--------------------------------------------------------------

Thanks [~rohithsharma] for sharing your views.

I do understand that it is not very clear what each metric value entails. For instance, I
had to look back into code to find out whether MEMORY reported from NM for each container
is in bytes or KB or MB, when I first looked at the REST output from timeline service.

Let us assume that we add UNIT to TimelineMetric to indicate KB/MB, etc. Question is how do
we store it then ? Currently metric name is stored as a column qualifier and metric value
as column value along with timestamps, for which we utilize HBase cell timestamps.  So question
is where do we store this extra information ?
This can probably be stored as a suffix to the metric name but then this would impact metric
filters. Or we can just add another column with metric name prefixed with a character indicating
UNIT(say, something like u!MEMORY) to store metric unit and just read it back at all times
or create necessary column filters if metrics to retrieve are specified. I will choose latter
if I have to mandatorily choose some option.

But the question is can't memory name not indicate what the unit of metric is ? For instance,
most of the Mapreduce counter names indicate unit too. We can publish MEMORY as MEMORY_BYTES
instead.
Or is it even required ? Typically the systems publishing to us would know the unit of the
metric they are writing. And hence would know what they are reading back. Except admins, it
is unlikely somebody is going to use the REST URLs' directly. These endpoints will typically
be used in a system which has another front-end to serve this data. Probably we can make metric
names published from YARN or MAPREDUCE more understandable(i.e. suffixed with units) if somebody
has to interpret REST output directly. Thoughts ?
You may say that this argument is based on HBase storage but then that is our primary storage
implementation for now. So, what to store and what not may depend on combination of necessity
and feasibility.
I am not completely sure if the need to store unit is strong enough to desire another column
qualifier in HBase implementation. We can probably adopt the approach mentioned above if we
have to store it. Do you have any other idea regarding how to store it ?
Is the concern that one code path may change(say, publishing side) and other may not (say,
UI rendering) if we do not make unit part of our model ?

Let us see what others think though.
cc [~sjlee0], [~gtCarrera9]


was (Author: varun_saxena):
Thanks [~rohithsharma] for sharing your views.

I do understand that it is not very clear which each metric value entails. For instance, I
had to look back into code to find out whether MEMORY reported from NM for each container
is in bytes or KB or MB, when I first looked at the REST output from timeline service.

Let us assume that we add UNIT to TimelineMetric to indicate KB/MB, etc. Question is how do
we store it then ? Currently metric name is stored as a column qualifier and metric value
as column value along with timestamps, for which we utilize HBase cell timestamps.  So question
is where do we store this extra information ?
This can probably be stored as a suffix to the metric name but then this would impact metric
filters. Or we can just add another column with metric name prefixed with a character indicating
UNIT(say, something like u!MEMORY) to store metric unit and just read it back at all times
or create necessary column filters if metrics to retrieve are specified. I will choose latter
if I have to mandatorily choose some option.

But the question is can't memory name not indicate what the unit of metric is ? For instance,
most of the Mapreduce counter names indicate unit too. We can publish MEMORY as MEMORY_BYTES
instead.
Or is it even required ? Typically the systems publishing to us would know the unit of the
metric they are writing. And hence would know what they are reading back. Except admins, it
is unlikely somebody is going to use the REST URLs' directly. These endpoints will typically
be used in a system which has another front-end to serve this data. Probably we can make metric
names published from YARN or MAPREDUCE more understandable(i.e. suffixed with units) if somebody
has to interpret REST output directly. Thoughts ?
You may say that this argument is based on HBase storage but then that is our primary storage
implementation for now. So, what to store and what not may depend on combination of necessity
and feasibility.
I am not completely sure if the need to store unit is strong enough to desire another column
qualifier in HBase implementation. We can probably adopt the approach mentioned above if we
have to store it. Do you have any other idea regarding how to store it ?
Is the concern that one code path may change(say, publishing side) and other may not (say,
UI rendering) if we do not make unit part of our model ?

Let us see what others think though.
cc [~sjlee0], [~gtCarrera9]

> Support UNIT for TimelineMetric
> -------------------------------
>
>                 Key: YARN-5751
>                 URL: https://issues.apache.org/jira/browse/YARN-5751
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: ATSv2
>            Reporter: Rohith Sharma K S
>            Assignee: Rohith Sharma K S
>            Priority: Critical
>
> ATSv2 allows users to write its metrics using TimelineMetric. But, there is no field
to tell what is the UNIT of published metric. This is very difficult when metrics are read.

> I propose to add UNIT for TimelineMetric so that once user can use this field to tell
what is the unit of published metric.  May be this can be optional for few kind or metrics
where unit is not required say CPU. But definitely there should be a way to set units while
publishing the entities. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message