hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Miklos Szegedi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-15372) Race conditions and possible leaks in the Shell class
Date Tue, 24 Apr 2018 20:28:00 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450470#comment-16450470

Miklos Szegedi commented on HADOOP-15372:

Thank you for the patch [~ebadger].
I see that you use SIGTERM instead of SIGKILL. This may still leak child processes.
if(!exec.awaitTermination(10, TimeUnit.SECONDS)) {
This may still leak shells, if the creator thread is waiting itself on something else and
it creates the subshell, once the second {{getAllShells()}} is called.

I have a general concern with the original patch that any other users of the Shell class may
need to change their code to use the same pattern. There are at least 5 other users of Shell
in the codebase.

Have you considered using process groups? Process groups may also solve the case when the
node manager signals the container localizer process with SIGKILL. Currently, if this happens
the shells will leak.

> Race conditions and possible leaks in the Shell class
> -----------------------------------------------------
>                 Key: HADOOP-15372
>                 URL: https://issues.apache.org/jira/browse/HADOOP-15372
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.10.0, 3.2.0
>            Reporter: Miklos Szegedi
>            Assignee: Eric Badger
>            Priority: Minor
>         Attachments: HADOOP-15372.001.patch
> YARN-5641 introduced some cleanup code in the Shell class. It has a race condition.
> runCommand()}} can be called while/after {{Shell.getAllShells()}} returned all the shells
to be cleaned up. This new thread can avoid the clean up, so that the process held by it
can be leaked causing leaked localized files/etc.
> I see another issue as well. {{Shell.runCommand()}} has a finally block with a {{
> process.destroy();}} to clean up. However, the try catch block does not cover all instructions
after the process is started, so for example we can exit the thread and leak the process,
if {{
> timeOutTimer.schedule(timeoutTimerTask, timeOutInterval);}} causes an exception.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org

View raw message