hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-59) "Text File Busy" errors launching MR tasks
Date Wed, 29 Aug 2012 21:16:08 GMT

    [ https://issues.apache.org/jira/browse/YARN-59?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13444423#comment-13444423

Andy Isaacson commented on YARN-59:


I commented on MAPREDUCE-2374, but I guess this is a better forum.
# I'd be happy to add a testcase, but how would you suggest that we detect the race condition?
Note that the ETXTBSY failure happens only on some systems, only under load, and only if a
script is written from the daemon and then executed by the shell.
# note that if we simply add a testcase which reproduces the failure scenario, it will continue
to pass even if the bug is reintroduced.  It will only fail if the race condition is triggered,
which seems to be impossible on some Linux versions (I never managed to reproduce ETXTBSY
on Debian or Ubuntu for example).
# There are other legitimate uses of "bash -c" where that's the only reasonable way to get
the desired behavior, so we can't just prohibit the construct.

> "Text File Busy" errors launching MR tasks
> ------------------------------------------
>                 Key: YARN-59
>                 URL: https://issues.apache.org/jira/browse/YARN-59
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Todd Lipcon
>            Assignee: Andy Isaacson
>             Fix For: 2.2.0-alpha, 0.23.3
> Some very small percentage of tasks fail with a "Text file busy" error.
> The following was the original diagnosis:
> {quote}
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that class swallows
all IO exceptions. We're not currently checking for errors, which I'm seeing result in occasional
task failures with the message "Text file busy" - assumedly because the close() call is failing
silently for some reason.
> {quote}
> .. but turned out to be another issue as well (see below)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message