Return-Path: X-Original-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 969CA91F8 for ; Mon, 2 Jul 2012 22:40:58 +0000 (UTC) Received: (qmail 57826 invoked by uid 500); 2 Jul 2012 22:40:58 -0000 Delivered-To: apmail-hadoop-mapreduce-issues-archive@hadoop.apache.org Received: (qmail 57797 invoked by uid 500); 2 Jul 2012 22:40:58 -0000 Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-issues@hadoop.apache.org Delivered-To: mailing list mapreduce-issues@hadoop.apache.org Received: (qmail 57780 invoked by uid 99); 2 Jul 2012 22:40:58 -0000 Received: from issues-vm.apache.org (HELO issues-vm) (140.211.11.160) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 02 Jul 2012 22:40:58 +0000 Received: from isssues-vm.apache.org (localhost [127.0.0.1]) by issues-vm (Postfix) with ESMTP id EE0A214284F for ; Mon, 2 Jul 2012 22:40:57 +0000 (UTC) Date: Mon, 2 Jul 2012 22:40:57 +0000 (UTC) From: "Shrinivas Joshi (JIRA)" To: mapreduce-issues@hadoop.apache.org Message-ID: <913129768.165.1341268857977.JavaMail.jiratomcat@issues-vm> In-Reply-To: <349597054.68436.1340910106173.JavaMail.jiratomcat@issues-vm> Subject: [jira] [Commented] (MAPREDUCE-4381) Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/MAPREDUCE-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13405403#comment-13405403 ] Shrinivas Joshi commented on MAPREDUCE-4381: -------------------------------------------- Thanks for the code review. I agree about the possibility of creating scalability issues. Setting progress interval to a very small value may lead to excessive status update events. Can we address this by setting a lower bound requirement on the value of progress interval that the user can set? If so, how does 500 milliseconds sound as the lower bound? I will address your comments in the 1st and 2nd bullet above in the revised version of this patch along with other changes. As you may have seen I have included a short description of the new property in src/mapred/mapred-default.xml file. Is there any other more appropriate file/location where this needs to be documented? Since this patch only makes progress_interval a tunable, would it suffice to test whether the value returned by JobConf matches the one set in mapred-site.xml? > Make PROGRESS_INTERVAL of org.apache.hadoop.mapred.Task a tunable > ----------------------------------------------------------------- > > Key: MAPREDUCE-4381 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-4381 > Project: Hadoop Map/Reduce > Issue Type: Improvement > Components: task, tasktracker > Reporter: Shrinivas Joshi > Priority: Minor > Attachments: progress_interval.patch > > > Currently PROGRESS_INTERVAL is a hard-coded value and is set to 3000 msec. We tried making it a tunable and experimented with different values. In some cases setting it to a smaller value like 1000 msec helps significantly improve performance of short running jobs such as piEstimator. This is because the task threads do not end up blocking for as many as 3 seconds for their last progress update event. We also noticed close to 14% improvement on Mahout KMeans iteration jobs which take more than 5 minutes on the test cluster that we are using. Please let me know if this seems to be a good idea. I have an initial patch that I have attached here. This is based on branch-1 tree. It may need some rework on MRv2 based branches I think. Also note that I have not changed the variable naming style for PROGRESS_INTERVAL even though it is not a public static final anymore. I can revise the patch if there are no objections to this idea. > Thanks. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira