hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Payne (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (YARN-7052) RM SchedulingMonitor gives no indication why the spawned thread crashed.
Date Wed, 23 Aug 2017 19:11:00 GMT

     [ https://issues.apache.org/jira/browse/YARN-7052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eric Payne updated YARN-7052:
-----------------------------
    Description: 
In YARN-7051, we ran into a case where the preemption monitor thread hung with no indication
of why.

The preemption monitor is started by the {{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}.
Once an uncaught throwable happens, nothing ever gets the result of the future, the thread
running the preemption monitor never dies, and it never gets rescheduled.

If {{HadoopExecutor}} were used, it would at least provide a {{HadoopScheduledThreadPoolExecutor}}
that logs the exception if one happens.

  was:
In YARN-7051, we ran into a case where the preemption monitor thread hung with no indication
of why. This was because the preemption monitor is started by the {{SchedulingExecutorService}}
from {{SchedulingMonitor#serviceStart}}, and then nothing ever gets the result of the future
or allows it to throw an exception if needed.

At least with {{HadoopExecutor}}, it will provide a {{HadoopScheduledThreadPoolExecutor}}
that logs the exception if one happens.

        Summary: RM SchedulingMonitor gives no indication why the spawned thread crashed.
 (was: RM SchedulingMonitor should use HadoopExecutors when creating ScheduledExecutorService)

Using {{HadoopExecutor}} may not be feasible.

The following is used to launch the preemption thread:
{code:title=SchedulingMonitor#serviceStart}
    ses = Executors.newSingleThreadScheduledExecutor(new ThreadFactory() {
...
    handler = ses.scheduleAtFixedRate(new PreemptionChecker(),
        0, monitorInterval, TimeUnit.MILLISECONDS);
{code}

{{HadoopExecutors}} provides a {{newSingleThreadScheduledExecutor}} interface, but it just
turns around and calls {{Executors#newSingleThreadScheduledExecutor}}. The {{HadoopExecutors#newSingleThreadScheduledExecutor}}
method does not provide the {{HadoopScheduledThreadPoolExecutor}} wrapper in the return value
of that interface, so you don't get the logging benefits if you use {{HadoopExecutors#newSingleThreadScheduledExecutor}}

Alternatively, we could have the thread itself catch and handle throwables.

The thread being launched by {{SchedulingMonitor#serviceStart}} is calling {{PreemptionChecker#run}},
which only handles {{YarnRuntimeException}}. Anything else will cause the thread to hang and
not get rescheduled.

I suggest that another solution would be to handle other throwables, log them, and either
re-throw or cancel the thread.

> RM SchedulingMonitor gives no indication why the spawned thread crashed.
> ------------------------------------------------------------------------
>
>                 Key: YARN-7052
>                 URL: https://issues.apache.org/jira/browse/YARN-7052
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: yarn
>            Reporter: Eric Payne
>            Assignee: Eric Payne
>
> In YARN-7051, we ran into a case where the preemption monitor thread hung with no indication
of why.
> The preemption monitor is started by the {{SchedulingExecutorService}} from {{SchedulingMonitor#serviceStart}}.
Once an uncaught throwable happens, nothing ever gets the result of the future, the thread
running the preemption monitor never dies, and it never gets rescheduled.
> If {{HadoopExecutor}} were used, it would at least provide a {{HadoopScheduledThreadPoolExecutor}}
that logs the exception if one happens.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message