hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ravi Prakash (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1940) Datanode can have more than one copy of same block when a failed disk is coming back in datanode
Date Fri, 06 Apr 2012 22:08:15 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13248924#comment-13248924

Ravi Prakash commented on HDFS-1940:

I checked in 0.23.3 (d60e9678bbc4d52fb9ab5d65363d452cc5926cff) and copying a block (and meta)
file from one disk to another does not show up in fsck. The DirectoryScanner does detect the
extra block which is not present in the memory map e.g. (here I had only 1 file = 1 block
in HDFS and then made a copy)
{noformat}2012-04-06 15:49:50,958 INFO  datanode.DirectoryScanner (DirectoryScanner.java:scan(389))
- BlockPool BP-1909597932- Total blocks: 2, missing metadata files:0,
missing block files:0, missing blocks in memory:1, mismatched blocks:0{noformat}

When I deleted the file from HDFS, I did see that the copied block (rather than the original
block) got deleted. I retried this experiment, and corrupted the original block, restarted
HDFS. On cat-ing the file, the original uncorrupted data was displayed from the copied block.
When I corrupted the copied block, and restarted HDFS, it was not able to serve the data from
the uncorrupted original block. This is a bummer. 
> Datanode can have more than one copy of same block when a failed disk is coming back
in datanode
> ------------------------------------------------------------------------------------------------
>                 Key: HDFS-1940
>                 URL: https://issues.apache.org/jira/browse/HDFS-1940
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions:
>            Reporter: Rajit Saha
>            Assignee: Bharath Mundlapudi
> There is a situation where one datanode can have more than one copy of same block due
to a disk fails and comes back after sometime in a datanode. And these duplicate blocks are
not getting deleted even after datanode and namenode restart.
> This situation can only happen in a corner case , when due to disk failure, the data
block is replicated to other disk of the same datanode.
> To simulate this scenario I copied a datablock and the associated .meta file from one
disk to another disk of same datanode, so the datanode is having 2 copy of same replica. Now
I restarted datanode and namenode. Still the extra data block and meta file is not deleted
from the datanode
> ls -l `find /grid/{0,1,2,3}/hadoop/var/hdfs/data/current -name blk_*`
> -rw-r--r-- 1 hdfs users 7814 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376
> -rw-r--r-- 1 hdfs users   71 May 13 21:05 /grid/1/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta
> -rw-r--r-- 1 hdfs users 7814 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376
> -rw-r--r-- 1 hdfs users   71 May 13 21:14 /grid/3/hadoop/var/hdfs/data/current/blk_1727421609840461376_579992.meta

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message