hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2374) Should not use PrintWriter to write taskjvm.sh
Date Tue, 24 Jul 2012 05:48:35 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13421198#comment-13421198
] 

Colin Patrick McCabe commented on MAPREDUCE-2374:
-------------------------------------------------

I just thought of something.  Suppose that the JVM is holding blahblahblah.sh open for write,
and meanwhile another thread forks a bash process (or something).  After the fork completes,
that process will hold blahblahblah.sh open for write with O_WRONLY.  At the very least, this
is a race condition that could lead to "mysterious" failures, since you don't know when the
fork'ed process will next get scheduled in relation to the parent process.

The O_CLOEXEC flag was introduced in Linux 2.6.23 to solve this problem, by atomically closing
the FDs on a fork.  However, I didn't see it being used in the strace output you posted. 
And it's certainly not around on RHEL5 and earlier.

If this is true, then I guess the solution Andy posted earlier is probably the best way to
go.  Just get rid of the -c and this behavior will be masked.
                
> Should not use PrintWriter to write taskjvm.sh
> ----------------------------------------------
>
>                 Key: MAPREDUCE-2374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2374
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>             Fix For: 0.22.1
>
>         Attachments: failed_taskjvmsh.strace, mapreduce-2374-on-20sec.txt, mapreduce-2374.txt,
mapreduce-2374.txt, successfull_taskjvmsh.strace
>
>
> Our use of PrintWriter in TaskController.writeCommand is unsafe, since that class swallows
all IO exceptions. We're not currently checking for errors, which I'm seeing result in occasional
task failures with the message "Text file busy" - assumedly because the close() call is failing
silently for some reason.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message