hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-142) In 0.20, move blocks being written into a blocksBeingWritten directory
Date Thu, 13 May 2010 01:55:45 GMT

    [ https://issues.apache.org/jira/browse/HDFS-142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12866940#action_12866940
] 

Todd Lipcon commented on HDFS-142:
----------------------------------

Had a test failure of TestFileAppend2 today with:

   [junit] 2010-05-12 12:20:46,249 WARN  protocol.InterDatanodeProtocol (DataNode.java:recoverBlock(1537))
- Failed to getBlockMetaDataInfo for block (=blk_7206139570868165957_1054) from datanode (=127.0.0.1:42179)
    [junit] java.io.IOException: Block blk_7206139570868165957_1054 does not exist in volumeMap.
    [junit]     at org.apache.hadoop.hdfs.server.datanode.FSDataset.validateBlockMetadata(FSDataset.java:1250)
    [junit]     at org.apache.hadoop.hdfs.server.datanode.DataNode.getBlockMetaDataInfo(DataNode.java:1425)
    [junit]     at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1521)
    [junit]     at org.apache.hadoop.hdfs.server.datanode.DataNode.recoverBlock(DataNode.java:1616)

This failure was actually on our vanilla 0.20 Hudson, not on the append branch.

In investigating this I noticed that validateBlockMetadata is not marked synchronized in FSDataset,
and thus accesses the volumeMap HashMap in an unsynchronized matter. If this races with eg
a rehash of the hashmap, it can give false non-existence.

Doesn't seem to be a problem in trunk append (this function is gone)

> In 0.20, move blocks being written into a blocksBeingWritten directory
> ----------------------------------------------------------------------
>
>                 Key: HDFS-142
>                 URL: https://issues.apache.org/jira/browse/HDFS-142
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Raghu Angadi
>            Assignee: dhruba borthakur
>            Priority: Blocker
>         Attachments: appendQuestions.txt, deleteTmp.patch, deleteTmp2.patch, deleteTmp5_20.txt,
deleteTmp5_20.txt, deleteTmp_0.18.patch, handleTmp1.patch, hdfs-142-commitBlockSynchronization-unknown-datanode.txt,
HDFS-142-deaddn-fix.patch, HDFS-142-finalize-fix.txt, hdfs-142-minidfs-fix-from-409.txt, HDFS-142-multiple-blocks-datanode-exception.patch,
hdfs-142-recovery-reassignment-and-bbw-cleanup.txt, hdfs-142-testcases.txt, hdfs-142-testleaserecovery-fix.txt,
HDFS-142_20.patch, testfileappend4-deaddn.txt
>
>
> Before 0.18, when Datanode restarts, it deletes files under data-dir/tmp  directory since
these files are not valid anymore. But in 0.18 it moves these files to normal directory incorrectly
making them valid blocks. One of the following would work :
> - remove the tmp files during upgrade, or
> - if the files under /tmp are in pre-18 format (i.e. no generation), delete them.
> Currently effect of this bug is that, these files end up failing block verification and
eventually get deleted. But cause incorrect over-replication at the namenode before that.
> Also it looks like our policy regd treating files under tmp needs to be defined better.
Right now there are probably one or two more bugs with it. Dhruba, please file them if you
rememeber.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message