hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files
Date Mon, 26 Nov 2012 19:40:58 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504022#comment-13504022
] 

Arun C Murthy commented on MAPREDUCE-4815:
------------------------------------------

Hmm... my bad, this looks like an oversight vis-a-vis hadoop-1.

Maybe the correct solution is to create a top-level (peer of user's expected output dir) _temporary_${mapred.output.dir}_${app_attempt_id}
directory and move it to ${mapred.output.dir}. This will mean just one HDFS mv.

Thoughts?
                
> FileOutputCommitter.commitJob can be very slow for jobs with many output files
> ------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3, 2.0.1-alpha
>            Reporter: Jason Lowe
>
> If a job generates many files to commit then the commitJob method call at the end of
the job can take minutes.  This is a performance regression from 1.x, as 1.x had the tasks
commit directly to the final output directory as they were completing and commitJob had very
little to do.  The commit work was processed in parallel and overlapped the processing of
outstanding tasks.  In 0.23/2.x, the commit is single-threaded and waits until all tasks have
completed before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message