hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-894) NodeHealthScriptRunner timeout checking is inaccurate on Windows
Date Mon, 08 Jul 2013 18:27:48 GMT

    [ https://issues.apache.org/jira/browse/YARN-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702237#comment-13702237

Chris Nauroth commented on YARN-894:

Hi, Chuan.  This patch looks good, but I'm seeing a failure in the test on my Windows machine.
 If I run just {{TestNodeHealthService#testNodeHealthScript}}, then it passes.  If I run the
whole {{TestNodeHealthService}} suite, then that same test fails with:

testNodeHealthScript(org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService)  Time
elapsed: 187 sec  <<< ERROR!
java.io.FileNotFoundException: C:\hdc\hadoop-yarn-project\hadoop-yarn\hadoop-yarn-server\hadoop-yarn-server-nodemanager\target\org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService-localDir\failingscript.cmd
(The process cannot ac
cess the file because it is being used by another process)
        at java.io.FileOutputStream.open(Native Method)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
        at java.io.FileOutputStream.<init>(FileOutputStream.java:145)
        at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.writeNodeHealthScriptFile(TestNodeHealthService.java:82)
        at org.apache.hadoop.yarn.server.nodemanager.TestNodeHealthService.testNodeHealthScript(TestNodeHealthService.java:154)

Do you see this happen too?  It's probably a file leak out of the prior test.
> NodeHealthScriptRunner timeout checking is inaccurate on Windows
> ----------------------------------------------------------------
>                 Key: YARN-894
>                 URL: https://issues.apache.org/jira/browse/YARN-894
>             Project: Hadoop YARN
>          Issue Type: Bug
>    Affects Versions: 3.0.0, 2.1.0-beta
>            Reporter: Chuan Liu
>            Assignee: Chuan Liu
>            Priority: Minor
>         Attachments: ReadProcessStdout.java, wait.cmd, wait.sh, YARN-894-trunk.patch
> In {{NodeHealthScriptRunner}} method, we will set HealthChecker status based on the Shell
execution results. Some status are based on the exception thrown during the Shell script execution.
> Currently, we will catch a non-ExitCodeException from ShellCommandExecutor, and if Shell
has the timeout status set at the same time, we will also set HealthChecker status to timeout.
> We have following execution sequence in Shell:
> 1) In main thread, schedule a delayed timer task that will kill the original process
upon timeout.
> 2) In main thread, open a buffered reader and feed in the process's standard input stream.
> 3) When timeout happens, the timer task will call {{Process#destroy()}}
>  to kill the main process.
> On Linux, when timeout happened and process killed, the buffered reader will thrown an
IOException with message: "Stream closed" in main thread.
> On Windows, we don't have the IOException. Only "-1" was returned from the reader that
indicates the buffer is finished. As a result, the timeout status is not set on Windows, and
{{TestNodeHealthService}} fails on Windows because of this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message