Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1AFE8D6F8 for ; Tue, 21 Aug 2012 07:35:43 +0000 (UTC) Received: (qmail 56872 invoked by uid 500); 21 Aug 2012 07:35:42 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 56813 invoked by uid 500); 21 Aug 2012 07:35:41 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 56692 invoked by uid 99); 21 Aug 2012 07:35:39 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Aug 2012 07:35:39 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 6C04E2C5C08 for ; Tue, 21 Aug 2012 07:35:38 +0000 (UTC) Date: Tue, 21 Aug 2012 18:35:38 +1100 (NCT) From: "Ahmed Radwan (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <904139937.34263.1345534538443.JavaMail.jiratomcat@arcas> In-Reply-To: <784234789.92822.1343076214657.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4469) Resource calculation in child tasks is CPU-heavy MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13438519#comment-13438519 ] Ahmed Radwan commented on MAPREDUCE-4469: ----------------------------------------- I have also looked into Todd's suggestions above. Here is an updated patch that incorporates further optimization in terms of filtering processes based on their owner and start time, and also cache these excluded processes to avoid recalculation in future calls. Here is the updated patch. I'll be still adding some tests. > Resource calculation in child tasks is CPU-heavy > ------------------------------------------------ > > Key: MAPREDUCE-4469 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4469 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: performance, task > Affects Versions: 1.0.3 > Reporter: Todd Lipcon > Assignee: Ahmed Radwan > Attachments: MAPREDUCE-4469.patch, MAPREDUCE-4469_rev2.patch, MAPREDUCE-4469_rev3.patch > > > In doing some benchmarking on a hadoop-1 derived codebase, I noticed that each of the child tasks was doing a ton of syscalls. Upon stracing, I noticed that it's spending a lot of time looping through all the files in /proc to calculate resource usage. > As a test, I added a flag to disable use of the ResourceCalculatorPlugin within the tasks. On a CPU-bound 500G-sort workload, this improved total job runtime by about 10% (map slot-seconds by 14%, reduce slot seconds by 8%) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira