hadoop-mapreduce-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files
Date Wed, 21 Nov 2012 16:47:59 GMT
Jason Lowe created MAPREDUCE-4815:
-------------------------------------

             Summary: FileOutputCommitter.commitJob can be very slow for jobs with many output
files
                 Key: MAPREDUCE-4815
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: mrv2
    Affects Versions: 2.0.1-alpha, 0.23.3
            Reporter: Jason Lowe


If a job generates many files to commit then the commitJob method call at the end of the job
can take minutes.  This is a performance regression from 1.x, as 1.x had the tasks commit
directly to the final output directory as they were completing and commitJob had very little
to do.  The commit work was processed in parallel and overlapped the processing of outstanding
tasks.  In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed
before commencing.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message