Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 43103 invoked from network); 16 Oct 2009 19:01:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 16 Oct 2009 19:01:57 -0000 Received: (qmail 22842 invoked by uid 500); 16 Oct 2009 19:01:57 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 22777 invoked by uid 500); 16 Oct 2009 19:01:56 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 22767 invoked by uid 99); 16 Oct 2009 19:01:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Oct 2009 19:01:56 +0000 X-ASF-Spam-Status: No, hits=-6.5 required=5.0 tests=AWL,BAYES_00 X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Oct 2009 19:01:52 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 089A2234C045 for ; Fri, 16 Oct 2009 12:01:32 -0700 (PDT) Message-ID: <1182751937.1255719692020.JavaMail.jira@brutus> Date: Fri, 16 Oct 2009 12:01:32 -0700 (PDT) From: "Tsz Wo (Nicholas), SZE (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Commented: (HDFS-668) TestFileAppend3#TC7 sometimes hangs In-Reply-To: <2063915852.1254422303497.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766660#action_12766660 ] Tsz Wo (Nicholas), SZE commented on HDFS-668: --------------------------------------------- - In BlocksMap, {code} + * Update the old block with the new block. + * + * The new block has a newer generation stamp so it requires remove + * the old entry first and reinsert the new entry + * + * @return the removed stored block in the map + */ + BlockInfo updateBlock(Block oldBlock, Block newBlock) { + BlockInfo blockInfo = map.remove(oldBlock); + blockInfo.setGenerationStamp(newBlock.getGenerationStamp()); + blockInfo.setNumBytes(newBlock.getNumBytes()); + map.put(blockInfo, blockInfo); + return blockInfo; + } {code} -* It is better to check oldBlock.getBlockId() == newBlock.getBlockId() or change updateBlock(..) to updateBlock(Block b, long newGenerationStamp, long newLength). -* The stored block is added back. So the javadoc "@return the removed stored block in the map" sounds incorrect. - In FSNamesystem, {code} @@ -1399,6 +1399,9 @@ // for (BlockInfo block: v.getBlocks()) { if (!blockManager.checkMinReplication(block)) { + NameNode.stateChangeLog.info("BLOCK* NameSystem.checkFileProgress: " + + "block " + block + " has not reached minimal replication " + + blockManager.minReplication); return false; } } @@ -1408,6 +1411,9 @@ // BlockInfo b = v.getPenultimateBlock(); if (b != null && !blockManager.checkMinReplication(b)) { + NameNode.stateChangeLog.info("BLOCK* NameSystem.checkFileProgress: " + + "block " + b + " has not reached minimal replication " + + blockManager.minReplication); return false; } } {code} -* These two log messages does not look like "state changes". Should we use FSNamesystem.LOG instead? - In FSNamesystem, {code} - final BlockInfo oldblockinfo = pendingFile.getLastBlock(); + final BlockInfoUnderConstruction blockinfo = pendingFile.getLastBlock(); {code} -* Could blockinfo be null? -* Is it the case that the last block must be a BlockInfoUnderConstruction? I am afarid that an IOException caused by a ClassCastException may be thrown by getLastBlock(). The existing code shown below looks incorrect: It first suppress unchecked warnings and then convert ClassCastException to an IOException. This makes it very hard to use it. How can the caller handle such IOException? {code} //INodeFile T getLastBlock() throws IOException { if (blocks == null || blocks.length == 0) return null; T returnBlock = null; try { @SuppressWarnings("unchecked") // ClassCastException is caught below T tBlock = (T)blocks[blocks.length - 1]; returnBlock = tBlock; } catch(ClassCastException cce) { throw new IOException("Unexpected last block type: " + blocks[blocks.length - 1].getClass().getSimpleName()); } return returnBlock; } {code} > TestFileAppend3#TC7 sometimes hangs > ----------------------------------- > > Key: HDFS-668 > URL: https://issues.apache.org/jira/browse/HDFS-668 > Project: Hadoop HDFS > Issue Type: Sub-task > Affects Versions: 0.21.0 > Reporter: Hairong Kuang > Assignee: Hairong Kuang > Fix For: 0.21.0 > > Attachments: hdfs-668.patch, loop.patch > > > TestFileAppend3 hangs because it fails on close the file. The following is the snippet of logs that shows the cause of the problem: > [junit] 2009-10-01 07:00:00,719 WARN hdfs.DFSClient (DFSClient.java:setupPipelineForAppendOrRecovery(3004)) - Error Recovery for block blk_-4098350497078465335_1007 in pipeline 127.0.0.1:58375, 127.0.0.1:36982: bad datanode 127.0.0.1:36982 > [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode (DataXceiver.java:opWriteBlock(224)) - Receiving block blk_-4098350497078465335_1007 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 > [junit] 2009-10-01 07:00:00,721 INFO datanode.DataNode (FSDataset.java:recoverClose(1248)) - Recover failed close blk_-4098350497078465335_1007 > [junit] 2009-10-01 07:00:00,723 INFO datanode.DataNode (DataXceiver.java:opWriteBlock(369)) - Received block blk_-4098350497078465335_1008 src: /127.0.0.1:40252 dest: /127.0.0.1:58375 of size 65536 > [junit] 2009-10-01 07:00:00,724 INFO hdfs.StateChange (BlockManager.java:addStoredBlock(1006)) - BLOCK* NameSystem.addStoredBlock: addStoredBlock request received for blk_-4098350497078465335_1008 on 127.0.0.1:58375 size 65536 But it does not belong to any file. > [junit] 2009-10-01 07:00:00,724 INFO namenode.FSNamesystem (FSNamesystem.java:updatePipeline(3946)) - updatePipeline(block=blk_-4098350497078465335_1007, newGenerationStamp=1008, newLength=65536, newNodes=[127.0.0.1:58375], clientName=DFSClient_995688145) > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.