hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alban Chevignard (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5713) File write fails after data node goes down
Date Mon, 11 May 2009 04:54:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707883#action_12707883
] 

Alban Chevignard commented on HADOOP-5713:
------------------------------------------

You are right that the client does not know why the data node is unavailable, but it does
not necessarily need to. In the proposed solution, the node is excluded for the lifetime of
the output stream only, which does not affect other clients or any of the data structures
on the name node.

This issue first came up while testing a cluster with 200 nodes, so it's actually more a matter
of network topology rather than just cluster size. For example, if a rack in the cluster contains
only two nodes and one of them goes down while a worker on the other node is trying to write
to a file, the name node will keep assigning that dead node to the writer until it realizes
that the node is down. Increasing the number of write retries in that case won't help. This
happens even if there are hundreds of other live nodes in the cluster. Since we have seen
this issue occur on a production cluster, we feel it's definitely worth the additional complexity
on the client to address it.


> File write fails after data node goes down
> ------------------------------------------
>
>                 Key: HADOOP-5713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5713
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Alban Chevignard
>         Attachments: failed_write.patch
>
>
> If a data node goes down while a file is being written do HDFS, the write fails with
the following errors:
> {noformat} 
> 09/04/20 17:15:39 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:39 INFO dfs.DFSClient: Abandoning block blk_-6792221430152215651_1003
> 09/04/20 17:15:45 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:45 INFO dfs.DFSClient: Abandoning block blk_-1056044503329698571_1003
> 09/04/20 17:15:51 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:51 INFO dfs.DFSClient: Abandoning block blk_-1144491637577072681_1003
> 09/04/20 17:15:57 INFO dfs.DFSClient: Exception in createBlockOutputStream java.io.IOException:
> Bad connect ack with firstBadLink 192.168.0.66:50010
> 09/04/20 17:15:57 INFO dfs.DFSClient: Abandoning block blk_6574618270268421892_1003
> 09/04/20 17:16:03 WARN dfs.DFSClient: DataStreamer Exception: java.io.IOException:
> Unable to create new block.
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2387)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1746)
> 	at org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1924)
> 09/04/20 17:16:03 WARN dfs.DFSClient: Error Recovery for block blk_6574618270268421892_1003
bad datanode[1]
> {noformat} 
> The tests were done with the following configuration:
> * Hadoop version 0.18.3
> * 3 data nodes with replication count of 2
> * 1 GB file write
> * 1 data node taken down during write
> This issue seems to be caused by the fact that there is a delay between the time a data
node goes down and the time it is marked as dead by the name node. This delay is unavoidable,
but the name node should not keep allocating new blocks to data nodes that are known to be
down by the client. Even by adjusting {{heartbeat.recheck.interval}}, there is still a window
during which this issue can occur.
> One possible fix would be to allow clients to exclude known bad data nodes when allocating
new blocks. See {{failed_write.patch}} for an example.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message