hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4488) CapacityScheduler: Compute per-container allocation latency and roll up to get per-application and per-queue
Date Thu, 01 Mar 2018 22:16:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16382744#comment-16382744

Wangda Tan commented on YARN-4488:

[~manirajv06@gmail.com], thanks for the explanation, I can understand the approach better

Regarding to the metrics, here's what I expected behavior: 

Delay of container should be T1 (container_allocated_time) - T2 (requested time). In your
proposal, T2 is {{time while creating ResourceRequest object}}, which may not be correct to
me. We have to consider a complex case. 

What I expected:
(time=1) An app has a resource request asks 5 * 2G containers
(time=3) 3 containers allocated, delay of the 3 containers = 2. Pending ask = 2
(time=5) App requested 10 containers (instead of 2) on the same priority.
(time=7) 5 containers allocated, 2 containers have delay (which is from the original ask)
has delay = 7-1 = 6
         And 3 containers have delay (which is from the additional ask) = 7-5 = 2{code}
This is a common scenario for apps have additional asks for failed containers (for example
MR), if a container failed, it will ask additional containers use the same priority (FAILED_MAPPER_PRIORITY),
so we should consider it.

The downside of this approach is it needs additional memory to record accurate requested time
for each resource request. An alternative approach is remember an average requested time for
each priority. (Assume we have X container requested at T1, Y additional container requested
at T2, the average time will be {{(X * T1 + Y * T2) / (X + Y)}}). 

*Regarding to implementation:* 

I'm not sure if a massive changes required, let's figure out semantics of the delay first,
and look at implementation later.

+ [~sunilg] 

+ [~ywskycn] to the thread: You pinged me offline about metrics related stuffs before, I
think you might be interested about this Jira.



> CapacityScheduler: Compute per-container allocation latency and roll up to get per-application
and per-queue
> ------------------------------------------------------------------------------------------------------------
>                 Key: YARN-4488
>                 URL: https://issues.apache.org/jira/browse/YARN-4488
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>            Reporter: Karthik Kambatla
>            Assignee: Manikandan R
>            Priority: Major
>         Attachments: YARN-4485.001.patch

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message