hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-10146) Workaround JDK7 Process fd close bug
Date Wed, 11 Dec 2013 17:29:09 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-10146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845557#comment-13845557
] 

Daryn Sharp commented on HADOOP-10146:
--------------------------------------

Anyone want to review?  After moving to JDK7 in production, we had many NMs under load going
OOM and crashing due to this bug.  Task retries masked that the cluster was slowly shrinking.
 As noted above, we've been running production clusters for 8 months with this patch.

> Workaround JDK7 Process fd close bug
> ------------------------------------
>
>                 Key: HADOOP-10146
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10146
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: util
>    Affects Versions: 0.23.0, 2.0.0-alpha, 3.0.0
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>            Priority: Critical
>         Attachments: HADOOP-10129.branch-23.patch, HADOOP-10129.patch
>
>
> JDK7's {{Process}} output streams have an async fd-close race bug.  This manifests as
commands run via o.a.h.u.Shell causing threads to hang, OOM, or cause other bizarre behavior.
 The NM is likely to encounter the bug under heavy load.
> Specifically, {{ProcessBuilder}}'s {{UNIXProcess}} starts a thread to reap the process
and drain stdout/stderr to avoid a lingering zombie process.  A race occurs if the thread
using the stream closes it, the underlying fd is recycled/reopened, while the reaper is draining
it.  {{ProcessPipeInputStream.drainInputStream}}'s will OOM allocating an array if {{in.available()}}
returns a huge number, or may wreak havoc by incorrectly draining the fd.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message