Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 76EA89A94 for ; Wed, 10 Dec 2014 20:48:15 +0000 (UTC) Received: (qmail 14382 invoked by uid 500); 10 Dec 2014 20:48:15 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 14317 invoked by uid 500); 10 Dec 2014 20:48:15 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 14305 invoked by uid 99); 10 Dec 2014 20:48:15 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Dec 2014 20:48:15 +0000 Date: Wed, 10 Dec 2014 20:48:15 +0000 (UTC) From: "Siqi Li (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14241704#comment-14241704 ] Siqi Li commented on MAPREDUCE-4815: ------------------------------------ [~dlmarion] Although, doing rename twice would result in wrong output directory structure, it seems commitTask() will not do rename directory at all. Since, the destination directory has already been created during setupJob() {code} if (!fs.mkdirs(jobAttemptPath)) { LOG.error("Mkdirs failed to create " + jobAttemptPath); } {code} Have you check the log to see if your job have logged error in creating this directory > FileOutputCommitter.commitJob can be very slow for jobs with many output files > ------------------------------------------------------------------------------ > > Key: MAPREDUCE-4815 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1 > Reporter: Jason Lowe > Assignee: Siqi Li > Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch > > > If a job generates many files to commit then the commitJob method call at the end of the job can take minutes. This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do. The commit work was processed in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)