hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1471) FileOutputCommitter does not safely clean up it's temporary files
Date Tue, 09 Feb 2010 17:15:27 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831546#action_12831546

Arun C Murthy commented on MAPREDUCE-1471:

Jim, all file-based output-formats check to ensure that their output-directory is *not* present
when they start i.e. 'working_path' is owned by one and only one job, hence this behaviour
is correct.

> FileOutputCommitter does not safely clean up it's temporary files
> -----------------------------------------------------------------
>                 Key: MAPREDUCE-1471
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1471
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 0.20.1
>            Reporter: Jim Finnessy
>   Original Estimate: 4h
>  Remaining Estimate: 4h
> When the FileOutputCommitter cleans up during it's cleanupJob method, it potentially
deletes the temporary files of other concurrent jobs.
> Since all the temporary files for all concurrent jobs are written to working_path/_temporary/
any concurrent tasks that have the same working_path will remove all currently executing jobs
when it removes working_path/_temporary during job cleanup.
> If the file name output is guaranteed by the client application to be unique, the temporary
files/directories should also be guaranteed to be unique to avoid this problem. Suggest modifying
cleanupJob to only remove files that it created itself.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message