hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ruyue Ma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-200) In HDFS, sync() not yet guarantees data available to the new readers
Date Tue, 21 Jul 2009 06:33:14 GMT

    [ https://issues.apache.org/jira/browse/HDFS-200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733502#action_12733502
] 

Ruyue Ma commented on HDFS-200:
-------------------------------

to: dhruba borthakur 

> This is not related to HDFS-4379. let me explain why.
> The problem is actually related to HDFS-xxx. The namenode waits for 10 minutes after
losing heartbeats from a datanode to declare it dead. During this 10 minutes, the NN is free
to choose the dead datanode as a possible replica for a newly allocated block.

> If during a write, the dfsclient sees that a block replica location for a newly allocated
block is not-connectable, it re-requests the NN to get a fresh set of replica locations of
the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream). > This setting works well when
you have a reasonable size cluster; if u have only 4 datanodes in the cluster, every retry
picks the dead-datanode and the above logic bails out.

> One solution is to change the value of dfs.client.block.write.retries to a much much
larger value, say 200 or so. Better still, increase the number of nodes in ur cluster.

Our modification: when getting block location from namenode, we give nn the excluded datanodes.
The list of dead datanodes is only for one block allocation. 

+++ hadoop-new/src/hdfs/org/apache/hadoop/hdfs/DFSClient.java	2009-07-20 00:19:03.000000000
+0800
@@ -2734,6 +2734,7 @@
       LocatedBlock lb = null;
       boolean retry = false;
       DatanodeInfo[] nodes;
+      DatanodeInfo[] exludedNodes = null;
       int count = conf.getInt("dfs.client.block.write.retries", 3);
       boolean success;
       do {
@@ -2745,7 +2746,7 @@
         success = false;
                 
         long startTime = System.currentTimeMillis();
-        lb = locateFollowingBlock(startTime);
+        lb = locateFollowingBlock(startTime, exludedNodes);
         block = lb.getBlock();
         nodes = lb.getLocations();
   
@@ -2755,6 +2756,19 @@
         success = createBlockOutputStream(nodes, clientName, false);
 
         if (!success) {
+         
+         LOG.info("Excluding node: " + nodes[errorIndex]);         
+         // Mark datanode as excluded
+         DatanodeInfo errorNode = nodes[errorIndex];
+         if (exludedNodes != null) {
+        	 DatanodeInfo[] newExcludedNodes = new DatanodeInfo[exludedNodes.length + 1];
+        	 System.arraycopy(exludedNodes, 0, newExcludedNodes, 0, exludedNodes.length);
+        	 newExcludedNodes[exludedNodes.length] = errorNode;
+        	 exludedNodes = newExcludedNodes;
+         } else {
+            exludedNodes = new DatanodeInfo[] { errorNode };
+         }
+
           LOG.info("Abandoning block " + block);
           namenode.abandonBlock(block, src, clientName);
 

> In HDFS, sync() not yet guarantees data available to the new readers
> --------------------------------------------------------------------
>
>                 Key: HDFS-200
>                 URL: https://issues.apache.org/jira/browse/HDFS-200
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: dhruba borthakur
>            Priority: Blocker
>         Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders11_20.txt,
fsyncConcurrentReaders12_20.txt, fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch,
fsyncConcurrentReaders5.txt, fsyncConcurrentReaders6.patch, fsyncConcurrentReaders9.patch,
hadoop-stack-namenode-aa0-000-12.u.powerset.com.log.gz, hypertable-namenode.log.gz, namenode.log,
namenode.log, Reader.java, Reader.java, reopen_test.sh, ReopenProblem.java, Writer.java, Writer.java
>
>
> In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc),
it says
> * A reader is guaranteed to be able to read data that was 'flushed' before the reader
opened the file
> However, this feature is not yet implemented.  Note that the operation 'flushed' is now
called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message