Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 7916 invoked from network); 25 Apr 2007 21:24:37 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Apr 2007 21:24:37 -0000 Received: (qmail 35761 invoked by uid 500); 25 Apr 2007 21:24:43 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 35486 invoked by uid 500); 25 Apr 2007 21:24:43 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 35477 invoked by uid 99); 25 Apr 2007 21:24:43 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2007 14:24:43 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 25 Apr 2007 14:24:35 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5AC31714076 for ; Wed, 25 Apr 2007 14:24:15 -0700 (PDT) Message-ID: <14944929.1177536255334.JavaMail.jira@brutus> Date: Wed, 25 Apr 2007 14:24:15 -0700 (PDT) From: "dhruba borthakur (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Created: (HADOOP-1297) datanode sending block reports to namenode once every second MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org datanode sending block reports to namenode once every second ------------------------------------------------------------ Key: HADOOP-1297 URL: https://issues.apache.org/jira/browse/HADOOP-1297 Project: Hadoop Issue Type: Bug Components: dfs Reporter: dhruba borthakur Assigned To: dhruba borthakur The namenode is requesting a block to be deleted. The datanode tries this operation and encounters an error because the block is not in the blockMap. The processCommand() method raises an exception. The code is such that the variable lastBlockReport is not set if processCommand() raises an exception. This means that the datanode immediately send another block report to the namenode. The eats up quite a bit of CPU on namenode. In short, the above condition causes the datanode to send blockReports almost once every second! I propose that we do the following: 1. in Datanode.offerService, replace the following piece of code DatanodeCommand cmd = namenode.blockReport(dnRegistration, data.getBlockReport()); processCommand(cmd); lastBlockReport = now; with DatanodeCommand cmd = namenode.blockReport(dnRegistration, data.getBlockReport()); lastBlockReport = now; processCommand(cmd); 2. In FSDataSet.invalidate: a) continue to process all blocks in invalidBlks[] even if one in the middle encounters a problem. b) if getFile() returns null, still invoke volumeMap.get() and print whether we found the block in volumes or not. The volumeMap is used to generate the blockReport and this might help in debugging. [ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.