hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-4712) CPU Usage Metric is not captured properly in YARN-2928
Date Sun, 06 Mar 2016 20:47:40 GMT

     [ https://issues.apache.org/jira/browse/YARN-4712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Naganarasimha G R updated YARN-4712:
    Attachment: YARN-4712-YARN-2928.v1.004.patch

thanks for the comments [~sjlee0].
Reverting the changes for the trunk code and limiting the scope to 2928
bq. My position is that we should skip reporting the value rather than reporting 0.
IIUC already existing patches are taking care of it, i am setting *cpuUsageTotalCoresPercentage*
to -1 when *cpuUsagePercentPerCore* is -1, and in *NMTimelinePublisher* i am skipping if *cpuUsageTotalCoresPercentage*
is  -1.

bq. Most of YARN's CPU accounting is based on cores rather than nodes/machines. IMO cpuUsagePercentPerCore
would be a better value to emit. Thoughts?
IMO *cpuUsageTotalCoresPercentage* is important to gauge how much of the cluster's CPU is
getting utlized, if its *cpuUsagePercentPerCore* i beleive it doesnt give the cluster's CPU
on aggregation from all containers. Infact we need to report both and also IMO *cpuUsageTotalCoresPercentage*
is not calculated properly it should be 
cpuUsageTotalCoresPercentage = (cpuUsagePercentPerCore /resourceCalculatorPlugin.getNumProcessors())/
In this way we will be able to identify how much % of cluster's CPU is getting utilized.
Also do we broaden the scope of this jira further or shall we discuss on this in a different

bq. Why are we appending the process id to the metric id? Doesn't this cause issues when we
do the aggregation? 
Agree have handled this, i beleive YARN-3816 was also trying to address it, but as that might
take little more time i am addressing as part of this jira.

> CPU Usage Metric is not captured properly in YARN-2928
> ------------------------------------------------------
>                 Key: YARN-4712
>                 URL: https://issues.apache.org/jira/browse/YARN-4712
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: timelineserver
>            Reporter: Naganarasimha G R
>            Assignee: Naganarasimha G R
>              Labels: yarn-2928-1st-milestone
>         Attachments: YARN-4712-YARN-2928.v1.001.patch, YARN-4712-YARN-2928.v1.002.patch,
YARN-4712-YARN-2928.v1.003.patch, YARN-4712-YARN-2928.v1.004.patch
> There are 2 issues with CPU usage collection 
> * I was able to observe that that many times CPU usage got from {{pTree.getCpuUsagePercent()}}
is ResourceCalculatorProcessTree.UNAVAILABLE(i.e. -1) but ContainersMonitor do the calculation
 i.e. {{cpuUsageTotalCoresPercentage = cpuUsagePercentPerCore /resourceCalculatorPlugin.getNumProcessors()}}
because of which UNAVAILABLE check in {{NMTimelinePublisher.reportContainerResourceUsage}}
is not encountered. so proper checks needs to be handled
> * {{EntityColumnPrefix.METRIC}} uses always LongConverter but ContainerMonitor is publishing
decimal values for the CPU usage.

This message was sent by Atlassian JIRA

View raw message