hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Wangda Tan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5543) ResourceManager SchedulingMonitor could potentially terminate the preemption checker thread
Date Tue, 23 Aug 2016 17:29:20 GMT

    [ https://issues.apache.org/jira/browse/YARN-5543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433248#comment-15433248

Wangda Tan commented on YARN-5543:

Thanks [~mshen] for the patch. Patch looks good. I also added you to contributor list so you
can assign task to yourself in the future.

I just noticed there's no tests to make sure scheduling monitor works well after started.
It will be better to add a test to make sure monitor policy will be invoked once the service
get started.

> ResourceManager SchedulingMonitor could potentially terminate the preemption checker
> -------------------------------------------------------------------------------------------
>                 Key: YARN-5543
>                 URL: https://issues.apache.org/jira/browse/YARN-5543
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: capacityscheduler, resourcemanager
>    Affects Versions: 2.7.0, 2.6.1
>            Reporter: Min Shen
>         Attachments: YARN-5543.001.patch
> In SchedulingMonitor.java, when the service starts, it starts a checker thread to perform
Capacity Scheduler's preemption. However, the implementation of this checker thread has the
following issue:
> {code}
> while (!stopped && !Thread.currentThread().isInterrupted()) {
>     ....
>     try {
>       Thread.sleep(monitorInterval)
>     } catch (InterruptedException e) {
>       ....
>       break;
>     }
> }
> {code}
> The above code snippet will terminate the checker thread whenever it is interrupted.

> We noticed in our cluster that this could lead to CapacityScheduler's preemption disabled
unexpectedly due to the checker thread getting terminated.
> We propose to use ScheduledExecutorService to improve the robustness of this part of
the code to ensure the liveness of CapacityScheduler's preemption functionality.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message