Return-Path: Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: (qmail 68920 invoked from network); 29 Mar 2010 18:05:51 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 29 Mar 2010 18:05:51 -0000 Received: (qmail 45730 invoked by uid 500); 29 Mar 2010 18:05:51 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 45703 invoked by uid 500); 29 Mar 2010 18:05:51 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 45690 invoked by uid 99); 29 Mar 2010 18:05:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Mar 2010 18:05:51 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Mar 2010 18:05:48 +0000 Received: from brutus.apache.org (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 5073D234C4B3 for ; Mon, 29 Mar 2010 18:05:27 +0000 (UTC) Message-ID: <323683751.555691269885927328.JavaMail.jira@brutus.apache.org> Date: Mon, 29 Mar 2010 18:05:27 +0000 (UTC) From: "Karthik Ranganathan (JIRA)" To: hdfs-issues@hadoop.apache.org Subject: [jira] Updated: (HDFS-142) Datanode should delete files under tmp when upgraded from 0.17 MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karthik Ranganathan updated HDFS-142: ------------------------------------- Attachment: HDFS-142-multiple-blocks-datanode-exception.patch Hey guys, If we had 2 or more files in the blocks being written directory, the data node would not be able to start up - because the code tries to add the BlockAndFile objects to a TreeSet internally, but the block and file object does not implement a comparable. The first addition goes through as the TreeSet does not have to compare anything. The data node dies on restart with the following exception: 2010-03-23 15:50:23,152 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.ClassCastException: org.apache.hadoop.hdfs.server.datanode.FSDataset$BlockAndFile cannot be cast to java.lang.Comparable at java.util.TreeMap.put(TreeMap.java:542) at java.util.TreeSet.add(TreeSet.java:238) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSDir.getBlockAndFileInfo(FSDataset.java:247) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.recoverBlocksBeingWritten(FSDataset.java:539) at org.apache.hadoop.hdfs.server.datanode.FSDataset$FSVolume.(FSDataset.java:381) at org.apache.hadoop.hdfs.server.datanode.FSDataset.(FSDataset.java:895) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:305) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:219) at org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:1337) at org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:1292) at org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:1300) at org.apache.hadoop.hdfs.server.datanode.DataNode.main(DataNode.java:1422) This patch makes the BlockAndFile class implement Comparable, and a unit test (thanks Nick) that verifies this case. > Datanode should delete files under tmp when upgraded from 0.17 > -------------------------------------------------------------- > > Key: HDFS-142 > URL: https://issues.apache.org/jira/browse/HDFS-142 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Raghu Angadi > Assignee: dhruba borthakur > Priority: Blocker > Attachments: appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt, deleteTmp5_20.txt, deleteTmp_0.18.patch, handleTmp1.patch, hdfs-142-minidfs-fix-from-409.txt, HDFS-142-multiple-blocks-datanode-exception.patch, HDFS-142_20.patch > > > Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp directory since these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly making them valid blocks. One of the following would work : > - remove the tmp files during upgrade, or > - if the files under /tmp are in pre-18 format (i.e. no generation), delete them. > Currently effect of this bug is that, these files end up failing block verification and eventually get deleted. But cause incorrect over-replication at the namenode before that. > Also it looks like our policy regd treating files under tmp needs to be defined better. Right now there are probably one or two more bugs with it. Dhruba, please file them if you rememeber. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.