hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Ge (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
Date Fri, 22 Jul 2016 20:24:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15390139#comment-15390139
] 

Chen Ge commented on YARN-4091:
-------------------------------

Thanks [~sunilg] for comments and improvements. Here are corresponding modifications and some
further comments.

For comment 1, I think multiple node heartbeats will not be invoked at the same time. They
happen sequentially, so {{startNodeUpdateRecording}} will not be visited by two node heartbeats
at the same time. There is no need to synchronize it.

For comment 2, {{activeRecordedNodes}} and {{recordingNodesAllocation}} are both to ensure
recording a complete node update after request. {{recordingNodesAllocation}} puts the recorded
node once {{activeRecordedNodes}} contains that node in {{startNodeUpdateRecording}}. Node
adds to {{activeRecordedNodes}} once user requests it. If we avoid {{activeRecordedNodes}},
we may begin to record activity even at the middle of a node heartbeat. It is necessary to
use {{activeRecordedNodes}} to wait until next node heartbeat.

We have addressed comment 3, 4, 5, 7 based on suggestions.

For comment 6, we have added a new intermediate util class called {{ActivitiesLogger}}. The
operations there are classified into three classes: APP, QUEUE and NODE. They handle "start",
"add" or "finish" operations from APP, QUEUE and NODE perspectives. Within CapacityScheduler,
Queue or ContainerAllocator, it simply calls the helper functions in {{ActivitiesLogger}}.
{{ActivitiesLogger}} will invoke the specific operations in {{ActivitiesManager}}.

Also for comment 8, we have made the activities API simpler. We delete the updateState operation
and just keep startRecording, addActivity, finishNodeAllocation and finishRecording. We combine
similar calls and optimize passed parameters as clean as possible.

As for minor nits, we change the function name as suggested.

Thanks again for the valuable comments.

> Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, YARN-4091-design-doc-v1.pdf,
YARN-4091.1.patch, YARN-4091.preliminary.1.patch
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message