hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4067) TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
Date Fri, 19 Oct 2012 23:42:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480529#comment-13480529
] 

Jing Zhao commented on HDFS-4067:
---------------------------------

Move the discussion from HDFS-4061 here:

When the NameNode invalides a block for a datanode D1 and remove the datanode-block pair from
the blockMap, and before the invalidation request is sent to the datanode D1, the BlockManager#computeDataNodeWork
also starts to work and schedule the replication to D1. So the invalidation and replication
request will be sent to D1 at the same time. D1 will then ignore the replication request (also
throws a ReplicaAlreadyExistsException), and delete the replica. Thus NN cannot receive the
blockreceived msg from D1. And the testcast will timeout in 5min which is smaller than the
timeout of PendingReplication request (usually 5~10 min).
                
> TestUnderReplicatedBlocks may fail due to ReplicaAlreadyExistsException
> -----------------------------------------------------------------------
>
>                 Key: HDFS-4067
>                 URL: https://issues.apache.org/jira/browse/HDFS-4067
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.0.0-alpha
>            Reporter: Eli Collins
>            Assignee: Jing Zhao
>              Labels: test-fail
>
> After adding the timeout to TestUnderReplicatedBlocks in HDFS-4061 we can see the root
cause of the failure is ReplicaAlreadyExistsException:
> {noformat}
> org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1541130889-172.29.121.238-1350435573411:blk_-3437032108997618258_1002
already exists in state FINALIZED and thus cannot be created.
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:799)
> 	at org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl.createTemporary(FsDatasetImpl.java:90)
> 	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:155)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:393)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:219)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message