hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
Date Wed, 13 Jul 2016 14:43:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375131#comment-15375131

Sunil G commented on YARN-4091:

Thanks [~ChenGe] for the patch and detailed doc.

Few initial comments, I will also share more feedback soon.

*REST api comments :*
1. For REST query ending with {{activities?nodeId=node-87}} I think it may scan all nodes
in that host if there are multiple NMs running on same node. correct?
2. If we are supporting above option, could we pass node names in comma separated form to
{{nodeId}} like  {{activities?nodeId=node-87,node-88}}   , May we can define a scope here
for number of node manager to query as response o/p also need to be simpler to understand.
3. For {{app-activities?appId=application_1468198570845_0022}} I think o/p is different from
node ? Could you also please attach REST o/p for app and node scenario.
4.   It is possible that some times we may look for relaxed scheduling by considering missed
opportunities. So one round of nodes has to undergo heartbeats to get an allocation for few
cases like (rack local/dflt partition from shared label) etc. Its better we add an option
like collect scheduler activity for an app till missed opportunity is 0. Thoughts?

*General Comments :*
1. ActivityManager is a class which holds all the informations regarding scheduling activities
tracker. Over the time, I think we might need to consider cases like cleanup of some out standing
requests, internal aggregation to compact and re-order collected data across heartbeats. For
all these cases, I think its better we can make ActivityManager as an extended service for
scheduler. So it can start a thread associated with service to do all monitoring and cleanup.
This is just a thought, pls feel free to share your opinion as its a good to have option.
2. I am in favor of having the current direct simple call to start/update/stop scheduling
activity. But will it be better if we define an read-write interface and clearly define who
will read the data, and who can write to the activity manager. On a second thought, could
we raise events to ActivityManager from scheduler and we can make it asynchronous for writes.
It may become more clear and simple. Thoughts?

> Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
> ------------------------------------------------------------------------------------------
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, YARN-4091-design-doc-v1.pdf,
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message