hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3436) Append to file is failing when one of the datanode where the block present is down.
Date Thu, 17 May 2012 11:01:09 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13277724#comment-13277724
] 

Vinay commented on HDFS-3436:
-----------------------------

Scenario is as follows:
---------------------
1. Cluster is having 4 DNs.
2. File is written to 3 DNs DN1->DN2->DN3 with genstamp of 1001
3. Now DN3 is stopped.
4. Now append is called.
5. For this append Client will try to create the pipeline DN1->DN2->DN3
During this time following things will happen
1. The Generation stamp will be updated in volumeMap to 1002
2. Now datanode will try to connect to next DN in pipeline.
If Next DN in pipeline is down, then exception will be thrown and client will try to reform
the pipeline.

Now since DN3 is down, in DN1 and DN2 genstamp is already updated to 1002. But client doesnot
know about this.
6. Now client is trying to add one more datanode to append pipeline. i.e. DN4. and ask DN1
or DN2 to transfer block to DN4. But Client will ask to transfer block with genstamp 1001.
7. Since DN1 and DN2 dont have block with genstamp 1001, so transfer will fail and Client
write also will fail.

Proposed solution
------------------
In DataXceiver#writeBlock(), before creating the BlockReceiver instance, if we try to create
mirror connection, then this solves the problem.
                
> Append to file is failing when one of the datanode where the block present is down.
> -----------------------------------------------------------------------------------
>
>                 Key: HDFS-3436
>                 URL: https://issues.apache.org/jira/browse/HDFS-3436
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.0
>            Reporter: Brahma Reddy Battula
>            Assignee: Vinay
>
> Scenario:
> =========
> 1. Cluster with 4 DataNodes.
> 2. Written file to 3 DNs, DN1->DN2->DN3
> 3. Stopped DN3,
> Now Append to file is failing due to addDatanode2ExistingPipeline is failed.
>  *CLinet Trace* 
> {noformat}
> 2012-04-24 22:06:09,947 INFO  hdfs.DFSClient (DFSOutputStream.java:createBlockOutputStream(1063))
- Exception in createBlockOutputStream
> java.io.IOException: Bad connect ack with firstBadLink as *******:50010
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1053)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:943)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
> 2012-04-24 22:06:09,947 WARN  hdfs.DFSClient (DFSOutputStream.java:setupPipelineForAppendOrRecovery(916))
- Error Recovery for block BP-1023239-10.18.40.233-1335275282109:blk_296651611851855249_1253
in pipeline *****:50010, ******:50010, *****:50010: bad datanode ******:50010
> 2012-04-24 22:06:10,072 WARN  hdfs.DFSClient (DFSOutputStream.java:run(549)) - DataStreamer
Exception
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:866)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
> 2012-04-24 22:06:10,072 WARN  hdfs.DFSClient (DFSOutputStream.java:hflush(1515)) - Error
while syncing
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:866)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
> java.io.EOFException: Premature EOF: no length prefix available
> 	at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:866)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)
> 	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)
> {noformat}
>  *DataNode Trace*  
> {noformat}
> 2012-05-17 15:39:12,261 ERROR datanode.DataNode (DataXceiver.java:run(193)) - host0.foo.com:49744:DataXceiver
error processing TRANSFER_BLOCK operation  src: /127.0.0.1:49811 dest: /127.0.0.1:49744
> java.io.IOException: BP-2001850558-xx.xx.xx.xx-1337249347060:blk_-8165642083860293107_1002
is neither a RBW nor a Finalized, r=ReplicaBeingWritten, blk_-8165642083860293107_1003, RBW
>   getNumBytes()     = 1024
>   getBytesOnDisk()  = 1024
>   getVisibleLength()= 1024
>   getVolume()       = E:\MyWorkSpace\branch-2\Test\build\test\data\dfs\data\data1\current
>   getBlockFile()    = E:\MyWorkSpace\branch-2\Test\build\test\data\dfs\data\data1\current\BP-2001850558-xx.xx.xx.xx-1337249347060\current\rbw\blk_-8165642083860293107
>   bytesAcked=1024
>   bytesOnDisk=102
> at org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(DataNode.java:2038)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.transferBlock(DataXceiver.java:525)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opTransferBlock(Receiver.java:114)
> 	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:78)
> 	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)
> 	at java.lang.Thread.run(Unknown Source)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message