hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "VinayaKumar B (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2932) Under replicated block after the pipeline recovery.
Date Thu, 22 Mar 2012 03:20:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13235301#comment-13235301
] 

VinayaKumar B commented on HDFS-2932:
-------------------------------------

Scenario:
 1. Client is writing to a Pipeline DN1--> DN2 -->DN3. Block id Ex; b1k_1_1001
 2. DN3 is stopped in between. Now pipeline recovery happens and block id is changed to b1k_1_1002.
 3. write is complete, and stream is closed.
 4. DN3 is restarted.

*Issue Case 1: DN3 coming back after file is closed.*
----------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report in RBW
state.
>> by this time, Since the file is closed, NN will mark this as replica as corrupt.
>> Now Replication will not succeed since It cannot find one more datanode.

*Issue Case 2: DN3 coming back before the file closure.*
------------------------------------------------------
>> Now DN3 will send the block reports to NN, which contains b1k_1_1001 report in RBW
state. but by this time file is not closed, then this DN is just added to targets array.
>> Replication request sent to Other DN (Ex DN2) to replicate this block to DN3.
>> Now DN3 will refuse the Replication throwing ReplicaAlreadyExistsException. because
while checking for the existence of the Block, generation stamp is not considered.
	{noformat}2012-03-22 08:30:39,406 ERROR datanode.DataNode (DataXceiver.java:run(171)) - 127.0.0.1:59082:DataXceiver
error processing WRITE_BLOCK operation  src: /127.0.0.1:59124 dest: /127.0.0.1:59082
org.apache.hadoop.hdfs.server.datanode.ReplicaAlreadyExistsException: Block BP-1348337625-169.254.103.145-1332385233856:blk_-4842149393874243436_1003
already exists in state RWR and thus cannot be created.
	at org.apache.hadoop.hdfs.server.datanode.FSDataset.createTemporary(FSDataset.java:1740)
	at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:151)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:340)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:98)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:66)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:167)
	at java.lang.Thread.run(Unknown Source){noformat}


*Basic Queries..?*
 1. Why while comparing the Block, Generationstamp is not considered...? 
       This behaviour is different compare to version 1.0
    
                
> Under replicated block after the pipeline recovery.
> ---------------------------------------------------
>
>                 Key: HDFS-2932
>                 URL: https://issues.apache.org/jira/browse/HDFS-2932
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.24.0
>            Reporter: J.Andreina
>             Fix For: 0.24.0
>
>
> Started 1NN,DN1,DN2,DN3 in the same machine.
> Written a huge file of size 2 Gb
> while the write for the block-id-1005 is in progress bruought down DN3.
> after the pipeline recovery happened.Block stamp changed into block_id_1006 in DN1,Dn2.
> after the write is over.DN3 is brought up and fsck command is issued.
> the following mess is displayed as follows
> "block-id_1006 is underreplicatede.Target replicas is 3 but found 2 replicas".

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message