Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8375E177B2 for ; Tue, 13 Jan 2015 19:17:34 +0000 (UTC) Received: (qmail 12999 invoked by uid 500); 13 Jan 2015 19:17:36 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 12899 invoked by uid 500); 13 Jan 2015 19:17:36 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 12680 invoked by uid 99); 13 Jan 2015 19:17:35 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Jan 2015 19:17:35 +0000 Date: Tue, 13 Jan 2015 19:17:35 +0000 (UTC) From: "Siqi Li (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (MAPREDUCE-4815) FileOutputCommitter.commitJob can be very slow for jobs with many output files MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14275733#comment-14275733 ] Siqi Li commented on MAPREDUCE-4815: ------------------------------------ I have attached a patch v9 based on the design suggestions from Jason and Gera. Also, I have run a bunch of performance testing job as follows, 1. Teragen job with 500 mappers Job Execution Time Job Commit Time Old APIs 43 sec 31 sec New APIs 31 sec 0.2 sec Savings ~38.7% 2. Teragen job with 5K mappers Job Execution Time Job Commit Time Old APIs 6 min 8 sec 2 min New APIs 4 min 10 sec 0.3 sec Savings ~33.3% 3. Teragen job with 20K mappers Job Execution Time Job Commit Time Old APIs 23 min 45 sec 10 min New APIs 15 min 36 sec 0.5 sec Savings ~33.3% According to the tables above, the average time saving of teragen job is ~33.3%, and the job commit time of new API is almost instant when compared to old APIs, which is linear to the number of tasks. Noted that this is when the entire cluster is used by this job only. In actual scenario, the job commit time may take much longer when NNs are under heavy load. In addition, this new APIs are optimized for large jobs with small average task finish time. Because, this kind of job require less time to finish all task, but use a lot of time doing committing using old APIs. This means a large portion of overall job time is used to commit. However, with the new APIs commit time is largely reduced, hence, the saving is huge. For the long running small jobs, the saving might be negligible, but it will not be worse than the old APIs > FileOutputCommitter.commitJob can be very slow for jobs with many output files > ------------------------------------------------------------------------------ > > Key: MAPREDUCE-4815 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4815 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 0.23.3, 2.0.1-alpha, 2.4.1 > Reporter: Jason Lowe > Assignee: Siqi Li > Attachments: MAPREDUCE-4815.v3.patch, MAPREDUCE-4815.v4.patch, MAPREDUCE-4815.v5.patch, MAPREDUCE-4815.v6.patch, MAPREDUCE-4815.v7.patch, MAPREDUCE-4815.v8.patch, MAPREDUCE-4815.v9.patch > > > If a job generates many files to commit then the commitJob method call at the end of the job can take minutes. This is a performance regression from 1.x, as 1.x had the tasks commit directly to the final output directory as they were completing and commitJob had very little to do. The commit work was processed in parallel and overlapped the processing of outstanding tasks. In 0.23/2.x, the commit is single-threaded and waits until all tasks have completed before commencing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)