Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 35512 invoked from network); 10 Apr 2009 20:17:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 Apr 2009 20:17:40 -0000 Received: (qmail 27999 invoked by uid 500); 10 Apr 2009 20:17:39 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 27903 invoked by uid 500); 10 Apr 2009 20:17:39 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 27893 invoked by uid 99); 10 Apr 2009 20:17:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 20:17:39 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2009 20:17:37 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id C86A0234C054 for ; Fri, 10 Apr 2009 13:17:15 -0700 (PDT) Message-ID: <82074596.1239394635819.JavaMail.jira@brutus> Date: Fri, 10 Apr 2009 13:17:15 -0700 (PDT) From: "Raghu Angadi (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Updated: (HADOOP-4584) Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode In-Reply-To: <178281774.1225759904336.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated HADOOP-4584: --------------------------------- Resolution: Fixed Fix Version/s: (was: 0.20.0) 0.21.0 Release Note: Improve datanode block reports and associated file system scan to avoid interefering with normal datanode operations. Large datanodes with many block should be handled much better now. Status: Resolved (was: Patch Available) I just committed this. Thanks Suresh. > Slow generation of blockReport at DataNode causes delay of sending heartbeat to NameNode > ---------------------------------------------------------------------------------------- > > Key: HADOOP-4584 > URL: https://issues.apache.org/jira/browse/HADOOP-4584 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Reporter: Hairong Kuang > Assignee: Suresh Srinivas > Fix For: 0.21.0 > > Attachments: 4584.brthread.2.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.3.patch, 4584.brthread.4.patch, 4584.brthread.4.patch, 4584.brthread.4.patch, 4584.brthread.5.patch, 4584.brthread.5.patch, 4584.hbthread.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, 4584.patch, Design.pdf, Design.pdf > > > sometimes due to disk or some other problems, datanode takes minutes or tens of minutes to generate a block report. It causes the datanode not able to send heartbeat to NameNode every 3 seconds. In the worst case, it makes NameNode to detect a lost heartbeat and wrongly decide that the datanode is dead. > It would be nice to have two threads instead. One thread is for scanning data directories and generating block report, and executes the requests sent by NameNode; Another thread is for sending heartbeats, block reports, and picking up the requests from NameNode. By having these two threads, the sending of heartbeats will not get delayed by any slow block report or slow execution of NameNode requests. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.