Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 26521 invoked from network); 9 Feb 2009 07:33:23 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 Feb 2009 07:33:23 -0000 Received: (qmail 77647 invoked by uid 500); 9 Feb 2009 07:33:20 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 77611 invoked by uid 500); 9 Feb 2009 07:33:20 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 77600 invoked by uid 99); 9 Feb 2009 07:33:20 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 08 Feb 2009 23:33:20 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Feb 2009 07:33:19 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id A2E98234C4B0 for ; Sun, 8 Feb 2009 23:32:59 -0800 (PST) Message-ID: <371688130.1234164779666.JavaMail.jira@brutus> Date: Sun, 8 Feb 2009 23:32:59 -0800 (PST) From: "Matei Zaharia (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-5185) Upate thread in FairScheduler runs too frequently In-Reply-To: <1244652775.1233913199592.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671754#action_12671754 ] Matei Zaharia commented on HADOOP-5185: --------------------------------------- As a temporary fix, feel free to submit a patch that scales up the interval based on cluster size or heartbeat interval. Or, if there's a way to make getTotalSlots non-synchronized or cache its result, we should do that, as there is no reason to call this method all the time. Incidentally, if we change the fair scheduler logic to not use deficits anymore (which I'm proposing in HADOOP-4803 and seems like a better idea the more I think of it), the update thread could start running much less frequently. The reason it runs so often now is to make the deficit computations accurate so that we don't have too many tasks per job starting/finishing in-between update calls. If we removed deficits, I think the main reason we'd need periodic updates will be preemption, and that check can happen much less frequently. > Upate thread in FairScheduler runs too frequently > ------------------------------------------------- > > Key: HADOOP-5185 > URL: https://issues.apache.org/jira/browse/HADOOP-5185 > Project: Hadoop Core > Issue Type: Bug > Components: contrib/fair-share > Reporter: Vinod K V > > The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.