hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5713) File write fails after data node goes down
Date Tue, 12 May 2009 13:49:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708437#action_12708437
] 

dhruba borthakur commented on HADOOP-5713:
------------------------------------------

> when createOutputStream fails, a dfs client should take the failed datanode out of the
pipeline, bump the block's ge

@Hairong: This was purposely *not done* when we did the client-streaming-data-to-datanodes.
The reason being that when you do this, you reduce the robustness of the block. You would
remember that when a replica in the pipeline fails, the client continues writing to the other
replicas and the NN makes no attempt to increase that's block's replication factor until the
file is closed. This means that when we remove a datanode from a pipeline, we are exposing
that block to a larger probability of going "missing or corrupt". This situation is unavoidable
when the client has written partial data to a block and then encounters an error in the pipeline,
in this case we ignore the bad datanode and continue with the remainder of the datanode(s).


On the other hand, when the createOutputStream fails, we have the luxury of ignoring all the
datanode inthe current pipeline because the client has not yet written any data to any of
the datanodes in the pipeline. We could have ignored only the bad datanode (as you suggested),
but this means that pipeline would be exposed to a higher probability of encountering a "missing/corrupt"
block if the other two replicas also fail sometime in the near future before the file is closed.
In this case, we can remove this degradation if we fetch an entirely new pipeline from the
NN.

@Alban: Increasing the number of write retries in that case won't help. 

I understand your use-case now. The NN takes 10 minutes of no-heartbeats from a datanode to
declare it dead. It is possible for you to set dfs.client.block.write.retries to a value that
causes the client to retry for more than 10 minutes? In that case, your test case should succeed.
The idea is that if the client does not bail out (but keeps retrying) for more than 10 minutes,
it is bound to succeed. Please let us know.

I will also look at your patch in greater detail.




> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with
the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003
bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data
node goes down and the time it is marked as dead by the name node. This delay is unavoidable,
but the name node should not keep allocating new blocks to data nodes that are known to be
down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window
during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating
new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message