Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 90714 invoked from network); 29 Oct 2008 20:59:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Oct 2008 20:59:45 -0000 Received: (qmail 37560 invoked by uid 500); 29 Oct 2008 20:59:48 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 37525 invoked by uid 500); 29 Oct 2008 20:59:48 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 37514 invoked by uid 99); 29 Oct 2008 20:59:48 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Oct 2008 13:59:48 -0700 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Oct 2008 20:58:34 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 677E8234C24F for ; Wed, 29 Oct 2008 13:58:44 -0700 (PDT) Message-ID: <153394854.1225313924422.JavaMail.jira@brutus> Date: Wed, 29 Oct 2008 13:58:44 -0700 (PDT) From: "Hairong Kuang (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4483) getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value In-Reply-To: <523896621.1224621524601.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12643650#action_12643650 ] Hairong Kuang commented on HADOOP-4483: --------------------------------------- Junit tests passed on my local machine: BUILD SUCCESSFUL Total time: 113 minutes 11 seconds Ant test-patch result: [exec] +1 overall. [exec] +1 @author. The patch does not contain any @author tags. [exec] +1 tests included. The patch appears to include 3 new or modified tests. [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. > getBlockArray in DatanodeDescriptor does not honor passed in maxblocks value > ---------------------------------------------------------------------------- > > Key: HADOOP-4483 > URL: https://issues.apache.org/jira/browse/HADOOP-4483 > Project: Hadoop Core > Issue Type: Bug > Components: dfs > Affects Versions: 0.18.1 > Environment: hadoop-0.18.1 running on a cluster of 16 nodes. > Reporter: Ahad Rana > Priority: Critical > Fix For: 0.18.2 > > Attachments: HADOOP-4483-v2.patch, HADOOP-4483-v3.patch, HADOOP-4483-v3.patch, invalidateBlocksCopy.patch, patch.HADOOP-4483 > > Original Estimate: 1h > Remaining Estimate: 1h > > The getBlockArray method in DatanodeDescriptor.java should honor the passed in maxblocks parameter. In its current form it passed in an array sized to min(maxblocks,blocks.size()) into the Collections.toArray method. As the javadoc for Collections.toArray indicates, the toArray method may discard the passed in array (and allocate a new array) if the number of elements returned by the iterator exceeds the size of the passed in array. As a result, the flawed implementation of this method would return all the invalid blocks for a data node in one go, and thus trigger the NameNode to send a DNA_INVALIDATE command to the DataNode with an excessively large number of blocks. This INVALIDATE command, in turn, could potentially take a very long time to process at the DataNode, and since DatanodeCommand(s) are processed in between heartbeats at the DataNode, this would trigger the NameNode to consider the DataNode to be offline / unresponsive (due to a lack of heartbeats). > In our use-case at CommonCrawl.org, we regularly do large scale hdfs file deletions after certain stages of our map-reduce pipeline. These deletes would make certain DataNode(s) unresponsive, and thus impact the cluster's capability to properly balance file-system reads / writes across the whole available cluster. This problem only surfaced once we migrated from our 16.2 deployment to the current 18.1 release. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.