Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 51343 invoked from network); 5 Oct 2006 20:48:35 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 5 Oct 2006 20:48:35 -0000 Received: (qmail 3179 invoked by uid 500); 5 Oct 2006 20:48:35 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 3142 invoked by uid 500); 5 Oct 2006 20:48:34 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 3125 invoked by uid 99); 5 Oct 2006 20:48:34 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Oct 2006 13:48:34 -0700 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received: from [209.237.227.198] ([209.237.227.198:39248] helo=brutus.apache.org) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id 46/BA-04543-2AF65254 for ; Thu, 05 Oct 2006 13:48:34 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5A6AB7141F2 for ; Thu, 5 Oct 2006 13:48:21 -0700 (PDT) Message-ID: <2939528.1160081301367.JavaMail.root@brutus> Date: Thu, 5 Oct 2006 13:48:21 -0700 (PDT) From: "Sanjay Dahiya (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-506) job tracker hangs on to dead task trackers "forever" In-Reply-To: <24887203.1157469982360.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N [ http://issues.apache.org/jira/browse/HADOOP-506?page=all ] Sanjay Dahiya updated HADOOP-506: --------------------------------- Status: Patch Available (was: In Progress) Fix Version/s: 0.7.0 > job tracker hangs on to dead task trackers "forever" > ---------------------------------------------------- > > Key: HADOOP-506 > URL: http://issues.apache.org/jira/browse/HADOOP-506 > Project: Hadoop > Issue Type: Bug > Components: mapred > Reporter: Yoram Arnon > Assigned To: Sanjay Dahiya > Priority: Minor > Fix For: 0.7.0 > > Attachments: Hadoop-506.patch > > > I see cases where a task tracker gets disconnected from the job tracker and disconnects, and then appears twice in the job tracker's list, with one instance being alive and well, and the other's 'time since last heartbeat' increasing monotonically. > that all makes sense. > What doesn't make sense, is that the old instances never expire. It's been over 400000 seoncds since the last heartbeat. And the cluster reports having more nodes up and running than its size (350 nodes in a 320 node cluster). > there should be some reasonable timout for these expired task trackers, somewhere between 10 minutes and an hour. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira