hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Ge (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-4091) Add REST API to retrieve scheduler activity
Date Mon, 01 Aug 2016 17:14:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402442#comment-15402442
] 

Chen Ge edited comment on YARN-4091 at 8/1/16 5:14 PM:
-------------------------------------------------------

Thanks [~Sunil G] for tests and comments. I have modified patch based on your suggestions.

1, 2, 3, 6 have addressed.
For 4, *finalAllocationState* are final state for one allocation. It is possible that it fails
to allocate any containers due to queue issues, so there is no chance to go into application
level. If we change the name to *finalAppAllocationState*, it is not proper to describe the
condition that only relates to queue.
Also for 5, We think *allocationState* is meaningful. If it is accepted, it means allocation
process successfully goes to next level. If it fails in queue level, we need state to indicate
that. It does not always go into application level.
For 7 and 9, it is helpful to add these information, but it will change a lot based on current
implementations and may need further code optimization. I am afraid I could not complete it
due to limited time. I believe there will be more thoughts and improvements in the future.
For 8, it is missing because second app is not added into application allocation list during
node heartbeat. When AM resource has not been successfully allocated, there is no activity
in node heartbeat. Not to mention the activity recording for it.

Thanks again for the detailed tests!


was (Author: chenge):
Thanks ~Sunil G for tests and comments. I have modified patch based on your suggestions.

1, 2, 3, 6 have addressed.
For 4, *finalAllocationState* are final state for one allocation. It is possible that it fails
to allocate any containers due to queue issues, so there is no chance to go into application
level. If we change the name to *finalAppAllocationState*, it is not proper to describe the
condition that only relates to queue.
Also for 5, We think *allocationState* is meaningful. If it is accepted, it means allocation
process successfully goes to next level. If it fails in queue level, we need state to indicate
that. It does not always go into application level.
For 7 and 9, it is helpful to add these information, but it will change a lot based on current
implementations and may need further code optimization. I am afraid I could not complete it
due to limited time. I believe there will be more thoughts and improvements in the future.
For 8, it is missing because second app is not added into application allocation list during
node heartbeat. When AM resource has not been successfully allocated, there is no activity
in node heartbeat. Not to mention the activity recording for it.

Thanks again for the detailed tests!

> Add REST API to retrieve scheduler activity
> -------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, SchedulerActivityManager-TestReport
v2.pdf, SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch,
YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch,
YARN-4091.preliminary.1.patch, app_activities.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message