hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chen Ge (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4091) Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
Date Tue, 26 Jul 2016 00:09:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15392897#comment-15392897
] 

Chen Ge commented on YARN-4091:
-------------------------------

We also run scheduler load simulator(sls) using fake data. There are 2000 nodes in total.
In one second, 2000 node heartbeats occur.

Two APIs are provided as activity view. The first one is to record activities for one node
heartbeat. The second one is to record application activities within a period of time, given
applicationId and time.

If running in previous patch without changes, one node heartbeat costs 0.2ms approximately.
If we only record application activities, the difference of running time is unnoticeable,
less than 0.01 ms. But if we record a complete node heartbeat activities, the running time
for each node heartbeat is 0.6ms, which is about 3X compared to the baseline. However, in
practice, only a few nodes' activities will be recorded at the same time. For example, if
there're 30 nodes activities being recoreded at the same time (which is already a huge number
to me). Compared to the time cost by 2000 node heartbeats, the time to record activities is
small (around 3% more overhead), so it is neglectable and acceptable.

> Improvement: Introduce more debug/diagnostics information to detail out scheduler activity
> ------------------------------------------------------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, YARN-4091-design-doc-v1.pdf,
YARN-4091.1.patch, YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.preliminary.1.patch, app_activities.json,
node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message