Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 42830 invoked from network); 18 Oct 2007 10:41:43 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 18 Oct 2007 10:41:43 -0000 Received: (qmail 91931 invoked by uid 500); 18 Oct 2007 10:41:29 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 91904 invoked by uid 500); 18 Oct 2007 10:41:29 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 91893 invoked by uid 99); 18 Oct 2007 10:41:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2007 03:41:29 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Oct 2007 10:41:41 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id D495771420D for ; Thu, 18 Oct 2007 03:40:50 -0700 (PDT) Message-ID: <27777180.1192704050866.JavaMail.jira@brutus> Date: Thu, 18 Oct 2007 03:40:50 -0700 (PDT) From: "Arun C Murthy (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-1900) the heartbeat and task event queries interval should be set dynamically by the JobTracker In-Reply-To: <30973750.1189796072225.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-1900?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12535879 ] Arun C Murthy commented on HADOOP-1900: --------------------------------------- bq. I wonder if instead we should just make it clusterSize/50+1? That way, small clusters will get a heartbeat of just one second, which should make them more responsive. +1 I'd like to see some numbers about how long it takes to process a heartbeat etc. before we decide on the actual scaling factors (both up and down). Given that we've run so far on clusters of 2000 nodes with heartbeat-interval of 10s, I'd suspect scaling it up by 10s for every 500 nodes is too conservative... anyway I'll believe the numbers when we have them. Also, while we are at this, I say we should start to consider *busy-ness* of JobTracker too, along with the cluster-size. So, for e.g., if the individual tasks are taking in the order of minutes, then it might not matter much if we send one every 20s or so, in some cases it might. I know that the sort's map tasks take around 40s each... So, one way to take this into account might be to maintain an average time-to-complete for all tasks in the system (of current jobs) and factor that into the scaling of the intervals. > the heartbeat and task event queries interval should be set dynamically by the JobTracker > ----------------------------------------------------------------------------------------- > > Key: HADOOP-1900 > URL: https://issues.apache.org/jira/browse/HADOOP-1900 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Owen O'Malley > Assignee: Amareshwari Sri Ramadasu > > The JobTracker should scale the intervals that the TaskTrackers use to contact it dynamically, based on how the busy it is and the size of the cluster. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.