hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6833) DirectoryScanner should not register a deleting block with memory of DataNode
Date Tue, 18 Nov 2014 18:11:35 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14216528#comment-14216528
] 

Chris Nauroth commented on HDFS-6833:
-------------------------------------

Thank you for working on this, Shinichi.  Echoing earlier comments, I'm a bit confused about
why {{DirectoryScanner}} has the responsibility to call {{FsDatasetSpi#removeDeletedBlocks}}.
 This causes us to delete the block ID from the internal data structure tracking still-to-be-deleted-from-disk
blocks.  This part of the code is logically disconnected from the code that actually does
the delete syscall, so it has no way to guarantee that the delete has really finished.  It
seems there would still be a race condition.  If the next scan triggered before the delete
completed, then the scanner wouldn't know that the block is still waiting to be deleted. 
(Of course, I'd expect this to be extremely rare given the fact that scan periods are usually
quite long, 6 hours by default.)  Moving this logic closer to the actual delete in {{ReplicaFileDeleteTask}}
would address this.

I'm curious if you can provide any more details about why this is so easy to reproduce in
your environment.  There is no doubt there is a bug here, but from what I can tell, it has
been there a long time, and I'd expect it to occur only very rarely.  The scan period is so
long (again, 6 hours by default) that I can't see how this can happen very often.  Your comments
seem to suggest that you can see this happen regularly, and on multiple DataNodes simultaneously,
resulting in data loss.  That would require scanners on independent DataNodes landing in a
lock-step schedule with each other.  For a typical 3-replica file, this should be very unlikely.
 For a 1 or even a 2-replica file, there is already a much higher risk of data loss due to
hardware failure despite this bug.  Is there anything specific to your configuration that
could make this more likely?  Have you configured the scan period to something much more frequent?
 Are you very rapidly decommissioning and recommissioning nodes?

> DirectoryScanner should not register a deleting block with memory of DataNode
> -----------------------------------------------------------------------------
>
>                 Key: HDFS-6833
>                 URL: https://issues.apache.org/jira/browse/HDFS-6833
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 3.0.0, 2.5.0, 2.5.1
>            Reporter: Shinichi Yamashita
>            Assignee: Shinichi Yamashita
>            Priority: Critical
>         Attachments: HDFS-6833-6-2.patch, HDFS-6833-6-3.patch, HDFS-6833-6.patch, HDFS-6833-7-2.patch,
HDFS-6833-7.patch, HDFS-6833.8.patch, HDFS-6833.9.patch, HDFS-6833.patch, HDFS-6833.patch,
HDFS-6833.patch, HDFS-6833.patch, HDFS-6833.patch
>
>
> When a block is deleted in DataNode, the following messages are usually output.
> {code}
> 2014-08-07 17:53:11,606 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
for deletion
> 2014-08-07 17:53:11,617 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> However, DirectoryScanner may be executed when DataNode deletes the block in the current
implementation. And the following messsages are output.
> {code}
> 2014-08-07 17:53:30,519 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Scheduling blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
for deletion
> 2014-08-07 17:53:31,426 INFO org.apache.hadoop.hdfs.server.datanode.DirectoryScanner:
BlockPool BP-1887080305-172.28.0.101-1407398838872 Total blocks: 1, missing metadata files:0,
missing block files:0, missing blocks in memory:1, mismatched blocks:0
> 2014-08-07 17:53:31,426 WARN org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl:
Added missing block to memory FinalizedReplica, blk_1073741825_1001, FINALIZED
>   getNumBytes()     = 21230663
>   getBytesOnDisk()  = 21230663
>   getVisibleLength()= 21230663
>   getVolume()       = /hadoop/data1/dfs/data/current
>   getBlockFile()    = /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
>   unlinked          =false
> 2014-08-07 17:53:31,531 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetAsyncDiskService:
Deleted BP-1887080305-172.28.0.101-1407398838872 blk_1073741825_1001 file /hadoop/data1/dfs/data/current/BP-1887080305-172.28.0.101-1407398838872/current/finalized/subdir0/subdir0/blk_1073741825
> {code}
> Deleting block information is registered in DataNode's memory.
> And when DataNode sends a block report, NameNode receives wrong block information.
> For example, when we execute recommission or change the number of replication, NameNode
may delete the right block as "ExcessReplicate" by this problem.
> And "Under-Replicated Blocks" and "Missing Blocks" occur.
> When DataNode run DirectoryScanner, DataNode should not register a deleting block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message