hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lavkesh Lahngir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4314) Adding container wait time as a metric at queue level and application level.
Date Thu, 19 Nov 2015 11:08:11 GMT

    [ https://issues.apache.org/jira/browse/YARN-4314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15013344#comment-15013344

Lavkesh Lahngir commented on YARN-4314:

Initial thoughts:
An AM sends resource requests with heartbeat and RM tries to fulfil the requests and sends
back the response. 
We can maintain a data structure called ContainerWaitTime in the AppSchedulingInfo to keep
track of the last timestamp of the heartbeat and number of pending containers. Resource requests
and resource allocations change the containerWaitTime object to increase or decrease pending
containers. With every heartbeat, the total wait time for this attempt will be increased by
(pending_containers *(current_timestamp - last_timestamp). At this moment last_timestamp will
be updated to the current timestamp.

Every attempt will maintain this data structure similar to memory-seconds and vcores-seconds.
In the AppImpl class, there is a method called getAppMetrics() where we will aggregate the
wait time from all the attempts and return it back. 

For AM container wait time, we need to add an additional parameter called scheduledTime. In
getAppMetrics() method, we can get total AM container wait time by summing up (attempt_scheduledTime-
attempt_startedTime) for all attempts. If the attempt is not yet scheduled, scheduledTime
will be replaced by current time. 

For adding these new metrics to the queue, we need to just update the queue_metrics object..
it will be aggregated at the queue level. 

For RM recovery we will need to save these metrics to the state store similar to other metrics
of the attempt.(memory-seconds and vcore-seconds)
Few more classes to be touched for implementing above, but the core idea remains the same.
Most of the code is independent of the scheduler apart from few line addition in the different
implementation of the scheduler. 

I have implemented an initial version. I will put out the patch once I have tested it completely.


> Adding container wait time as a metric at queue level and application level.
> ----------------------------------------------------------------------------
>                 Key: YARN-4314
>                 URL: https://issues.apache.org/jira/browse/YARN-4314
>             Project: Hadoop YARN
>          Issue Type: New Feature
>            Reporter: Lavkesh Lahngir
>            Assignee: Lavkesh Lahngir
> There is a need for adding the container wait-time which can be tracked at the queue
and application level. 
> An application can have two kinds of wait times. One is AM wait time after submission
and another is total container wait time between AM asking for containers and getting them.

This message was sent by Atlassian JIRA

View raw message