hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4815) Speed up FileOutputCommitter#commitJob for many output files
Date Wed, 11 Mar 2015 14:37:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14356947#comment-14356947
] 

Hudson commented on MAPREDUCE-4815:
-----------------------------------

FAILURE: Integrated in Hadoop-Hdfs-trunk #2061 (See [https://builds.apache.org/job/Hadoop-Hdfs-trunk/2061/])
MAPREDUCE-4815. Speed up FileOutputCommitter#commitJob for many output files. (Siqi Li via
gera) (gera: rev aa92b764a7ddb888d097121c4d610089a0053d11)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapreduce/lib/output/TestFileOutputCommitter.java
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/resources/mapred-default.xml
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/FileOutputCommitter.java
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/test/java/org/apache/hadoop/mapred/TestFileOutputCommitter.java
* hadoop-mapreduce-project/CHANGES.txt


> Speed up FileOutputCommitter#commitJob for many output files
> ------------------------------------------------------------
>
>                 Key: MAPREDUCE-4815
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv2
>    Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1
>            Reporter: Jason Lowe
>            Assignee: Siqi Li
>              Labels: perfomance
>             Fix For: 2.7.0
>
>         Attachments: MAPREDUCE-4815.v10.patch, MAPREDUCE-4815.v11.patch, MAPREDUCE-4815.v12.patch,
MAPREDUCE-4815.v13.patch, MAPREDUCE-4815.v14.patch, MAPREDUCE-4815.v15.patch, MAPREDUCE-4815.v16.patch,
MAPREDUCE-4815.v17.patch, MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch,
MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch
>
>
> If a job generates many files to commit then the commitJob method call at the end of
the job can take minutes.  This is a performance regression from 1.x, as 1.x had the tasks
commit directly to the final output directory as they were completing and commitJob had very
little to do.  The commit work was processed in parallel and overlapped the processing of
outstanding tasks.  In 0.23/2.x, the commit is single-threaded and waits until all tasks have
completed before commencing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message