hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruyue Ma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
Date Fri, 18 Sep 2009 07:08:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12757019#action_12757019
] 

Ruyue Ma commented on HDFS-630:
-------------------------------

Ruyue Ma added a comment - 20/Jul/09 11:32 PM
to: dhruba borthakur

> This is not related to HDFS-4379. let me explain why.
> The problem is actually related to HDFS-xxx. The namenode waits for 10 minutes after
losing heartbeats from a datanode to declare it dead. During this 10 minutes, the NN is free
to choose the dead datanode as a possible replica for a newly allocated block.

> If during a write, the dfsclient sees that a block replica location for a newly allocated
block is not-connectable, it re-requests the NN to get a fresh set of replica locations of
the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream). > This setting works well when
you have a reasonable size cluster; if u have only 4 datanodes in the cluster, every retry
picks the dead-datanode and the above logic bails out.

> One solution is to change the value of dfs.client.block.write.retries to a much much
larger value, say 200 or so. Better still, increase the number of nodes in ur cluster.

Our modification: when getting block location from namenode, we give nn the excluded datanodes.
The list of dead datanodes is only for one block allocation.

+++ hadoop-new/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java 2009-07-20 00:19:03.000000000
+0800
@@ -2734,6 +2734,7 @@
LocatedBlock lb = null;
boolean retry = false;
DatanodeInfo[] nodes;
+ DatanodeInfo[] exludedNodes = null;
int count = conf.getInt("dfs.client.block.write.retries", 3);
boolean success;
do {
@@ -2745,7 +2746,7 @@
success = false;

long startTime = System.currentTimeMillis();

    * lb = locateFollowingBlock(startTime);
      + lb = locateFollowingBlock(startTime, exludedNodes);
      block = lb.getBlock();
      nodes = lb.getLocations();

@@ -2755,6 +2756,19 @@
success = createBlockOutputStream(nodes, clientName, false);

if (!success) {
+
+ LOG.info("Excluding node: " + nodes[errorIndex]);
+ // Mark datanode as excluded
+ DatanodeInfo errorNode = nodes[errorIndex];
+ if (exludedNodes != null) { + DatanodeInfo[] newExcludedNodes = new DatanodeInfo[exludedNodes.length
+ 1]; + System.arraycopy(exludedNodes, 0, newExcludedNodes, 0, exludedNodes.length); + newExcludedNodes[exludedNodes.length]
= errorNode; + exludedNodes = newExcludedNodes; + } else {
+ exludedNodes = new DatanodeInfo[] { errorNode };
+ }
+
LOG.info("Abandoning block " + block);
namenode.abandonBlock(block, src, clientName);
[ Show » ]
Ruyue Ma added a comment - 20/Jul/09 11:32 PM to: dhruba borthakur > This is not related
to HDFS-4379. let me explain why. > The problem is actually related to HDFS-xxx. The namenode
waits for 10 minutes after losing heartbeats from a datanode to declare it dead. During this
10 minutes, the NN is free to choose the dead datanode as a possible replica for a newly allocated
block. > If during a write, the dfsclient sees that a block replica location for a newly
allocated block is not-connectable, it re-requests the NN to get a fresh set of replica locations
of the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream). > This setting works well when
you have a reasonable size cluster; if u have only 4 datanodes in the cluster, every retry
picks the dead-datanode and the above logic bails out. > One solution is to change the
value of dfs.client.block.write.retries to a much much larger value, say 200 or so. Better
still, increase the number of nodes in ur cluster. Our modification: when getting block location
from namenode, we give nn the excluded datanodes. The list of dead datanodes is only for one
block allocation. +++ hadoop-new/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java 2009-07-20
00:19:03.000000000 +0800 @@ -2734,6 +2734,7 @@ LocatedBlock lb = null; boolean retry = false;
DatanodeInfo[] nodes; + DatanodeInfo[] exludedNodes = null; int count = conf.getInt("dfs.client.block.write.retries",
3); boolean success; do { @@ -2745,7 +2746,7 @@ success = false; long startTime = System.currentTimeMillis();

    * lb = locateFollowingBlock(startTime); + lb = locateFollowingBlock(startTime, exludedNodes);
block = lb.getBlock(); nodes = lb.getLocations();

@@ -2755,6 +2756,19 @@ success = createBlockOutputStream(nodes, clientName, false); if (!success)
{ + + LOG.info("Excluding node: " + nodes[errorIndex]); + // Mark datanode as excluded + DatanodeInfo
errorNode = nodes[errorIndex]; + if (exludedNodes != null) { + DatanodeInfo[] newExcludedNodes
= new DatanodeInfo[exludedNodes.length + 1]; + System.arraycopy(exludedNodes, 0, newExcludedNodes,
0, exludedNodes.length); + newExcludedNodes[exludedNodes.length] = errorNode; + exludedNodes
= newExcludedNodes; + } else { + exludedNodes = new DatanodeInfo[] { errorNode }; + } + LOG.info("Abandoning
block " + block); namenode.abandonBlock(block, src, clientName);

[ Permlink | « Hide ]
dhruba borthakur added a comment - 22/Jul/09 07:14 AM
Hi Ruyue, your option of excluding specific datanodes (specified by the client) sounds reasonable.
This might help in the case of network partitioning where a specific client loses access to
a set of datanodes while the datanode is alive and well and is able to send heartbeats to
the namenode. Can you pl create a separate JIRA for your prosposed fix and attach your patch
there? Thanks.
[ Show » ]
dhruba borthakur added a comment - 22/Jul/09 07:14 AM Hi Ruyue, your option of excluding specific
datanodes (specified by the client) sounds reasonable. This might help in the case of network
partitioning where a specific client loses access to a set of datanodes while the datanode
is alive and well and is able to send heartbeats to the namenode. Can you pl create a separate
JIRA for your prosposed fix and attach your patch there? Thanks.


> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes
when locating the next block.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-630
>                 URL: https://issues.apache.org/jira/browse/HDFS-630
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs client
>    Affects Versions: 0.20.1, 0.21.0
>            Reporter: Ruyue Ma
>            Assignee: Ruyue Ma
>            Priority: Minor
>             Fix For: 0.21.0
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a newly allocated
block is not-connectable, it re-requests the NN to get a fresh set of replica locations of
the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have few datanodes
in the cluster, every retry maybe pick the dead-datanode and the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the excluded datanodes.
The list of dead datanodes is only for one block allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message