Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: core-dev@hadoop.apache.org
Message-ID: <371688130.1234164779666.JavaMail.jira@brutus>
Date: Sun, 8 Feb 2009 23:32:59 -0800 (PST)
From: "Matei Zaharia (JIRA)" <jira@apache.org>
To: core-dev@hadoop.apache.org
Subject: [jira] Commented: (HADOOP-5185) Upate thread in FairScheduler runs
 too frequently
In-Reply-To: <1244652775.1233913199592.JavaMail.jira@brutus>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HADOOP-5185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12671754#action_12671754 ] 

Matei Zaharia commented on HADOOP-5185:
---------------------------------------

As a temporary fix, feel free to submit a patch that scales up the interval based on cluster size or heartbeat interval. Or, if there's a way to make getTotalSlots non-synchronized or cache its result, we should do that, as there is no reason to call this method all the time.

Incidentally, if we change the fair scheduler logic to not use deficits anymore (which I'm proposing in HADOOP-4803 and seems like a better idea the more I think of it), the update thread could start running much less frequently. The reason it runs so often now is to make the deficit computations accurate so that we don't have too many tasks per job starting/finishing in-between update calls. If we removed deficits, I think the main reason we'd need periodic updates will be preemption, and that check can happen much less frequently.

> Upate thread in FairScheduler runs too frequently
> -------------------------------------------------
>
>                 Key: HADOOP-5185
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5185
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: contrib/fair-share
>            Reporter: Vinod K V
>
> The UpdateThread in FairScheduler runs every 500ms (hardcoded). This proves to be very costly when running large clusters. UpdateThread tries to acquire lock on JT object every that often and so seriously affects HeartBeat processing besides everything else. The update interval should be a function of the cluster size. Or in the minimum it should be configurable and by default should be set to a reasonably high default value.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.