hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4091) Add REST API to retrieve scheduler activity
Date Mon, 01 Aug 2016 18:35:21 GMT

    [ https://issues.apache.org/jira/browse/YARN-4091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402597#comment-15402597
] 

Wangda Tan commented on YARN-4091:
----------------------------------

[~eepayne],

Thanks you so much for reviewing this JIRA.

bq. I would be interested to know how you gathered this information
We're using SLS simulate a 2k nodes cluster, and added few debug logging to print time (in
nano seconds) costed by each scheduler allocation.

bq. Also, how are you limiting the number of nodes whose state is being logged?
We deliberately designed REST API for this JIRA to limit number of nodes being recorded concurrently.
The goal of this JIRA is to return human-readable result and avoid noticeable slowdown to
scheduler. So at each time, user can request recording only one node heartbeat. Too much data
(like 2000 node heartbeat per sec) returned by scheduler will be definitely not readable by
users.
With this, we can only send limited number of request per sec to limit #recorded node allocation
per sec.
So from my perspective, it may not be a valid use case that someone need to record 2,000 nodes
altogether. Is it make sense to you? And if you're concerned about this API is abused by users,
we can add ACLs or traffic control on the client (web UI) side or server side.

bq. I am concerned about the performance load this feature will add to the resource manager.
I have analyzed the code and experimented with the feature on a 3-node cluster. It appears
that the state is being recorded for every node on every heartbeat...
If you could take a look at the implementation, startNodeUpdateRecording/finishNodeUpdateRecording
only check if a key exists in a ConcurrentHashMap when node recording is not enabled, from
our performance test, we didn't see it added any overhead comparing to original scheduler
code without applying the patch. Also, I just wrote a quick test:
{code}
    ConcurrentHashMap<String, String> map = new ConcurrentHashMap();

    java.util.Random random = new java.util.Random();
    List<String> arr = new ArrayList();

    for (int i = 0; i < 100000; i++) {
      String s = String.valueOf(random.nextFloat());
      map.put(s, s);
      arr.add(s);
    }

    long time = System.currentTimeMillis();
    for (String s : arr) {
      map.get(s);
    }
    System.out.println(System.currentTimeMillis() - time);
{code}

Total time spent by 100k get operation is around 16 ms on my laptop. So each get takes only
160 ns.


> Add REST API to retrieve scheduler activity
> -------------------------------------------
>
>                 Key: YARN-4091
>                 URL: https://issues.apache.org/jira/browse/YARN-4091
>             Project: Hadoop YARN
>          Issue Type: Sub-task
>          Components: capacity scheduler, resourcemanager
>    Affects Versions: 2.7.0
>            Reporter: Sunil G
>            Assignee: Chen Ge
>         Attachments: Improvement on debugdiagnostic information - YARN.pdf, SchedulerActivityManager-TestReport
v2.pdf, SchedulerActivityManager-TestReport.pdf, YARN-4091-design-doc-v1.pdf, YARN-4091.1.patch,
YARN-4091.2.patch, YARN-4091.3.patch, YARN-4091.4.patch, YARN-4091.5.patch, YARN-4091.5.patch,
YARN-4091.preliminary.1.patch, app_activities.json, node_activities.json
>
>
> As schedulers are improved with various new capabilities, more configurations which tunes
the schedulers starts to take actions such as limit assigning containers to an application,
or introduce delay to allocate container etc. 
> There are no clear information passed down from scheduler to outerworld under these various
scenarios. This makes debugging very tougher.
> This ticket is an effort to introduce more defined states on various parts in scheduler
where it skips/rejects container assignment, activate application etc. Such information will
help user to know whats happening in scheduler.
> Attaching a short proposal for initial discussion. We would like to improve on this as
we discuss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message