hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cosmin Lehene (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
Date Fri, 13 Nov 2009 13:44:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777500#action_12777500
] 

Cosmin Lehene commented on HDFS-630:
------------------------------------

stack: I can't reproduce it on 0.21. I did find it in the NN log before upgrading the HBase
jar to the patched hdfs. 

java.io.IOException: Cannot complete block: block has not been COMMITTED by the client
        at org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction.convertToCompleteBlock(BlockInfoUnderConstruction.java:158)
        at org.apache.hadoop.hdfs.server.namenode.BlockManager.completeBlock(BlockManager.java:288)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1243)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:637)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)
        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)

I should point that 
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)

line 621 in the NameNode means it was called from an unpached DFSClient that calls the old
NameNode interface
line 621: return addBlock(src, clientName, null, null); 

This is part of  public LocatedBlock addBlock(String src, String clientName, Block previous)

  @Override
  public LocatedBlock addBlock(String src, String clientName,
                               Block previous)
    throws IOException {
    return addBlock(src, clientName, null, null);
  }

This is different than your stacktrace http://pastie.org/695936 that calls the complete()
method. 

However could you search for the same error while adding a new block with addBlock() (like
mine)? If you find it, you could figure out what's the entry point in NameNode, and if it's
line 621 you might have a an unpatched DFSClient. 

However, even with an unpatched DFSClient I still fail, yet, to figure out why would it cause
it. Perhaps I should get a better understanding of the cause of the exception. So far, from
the code comments in BlockInfoUnderConstruction I have that
"the state of the block  (the generation stamp and the length) has not been committed by the
client or it does not have at least a minimal number of replicas reported from data-nodes.
"

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes
when locating the next block.
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-630
>                 URL: https://issues.apache.org/jira/browse/HDFS-630
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs client
>    Affects Versions: 0.21.0
>            Reporter: Ruyue Ma
>            Assignee: Ruyue Ma
>            Priority: Minor
>             Fix For: 0.21.0
>
>         Attachments: 0001-Fix-HDFS-630-for-0.21.patch, HDFS-630.patch
>
>
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a newly allocated
block is not-connectable, it re-requests the NN to get a fresh set of replica locations of
the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have few datanodes
in the cluster, every retry maybe pick the dead-datanode and the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the excluded datanodes.
The list of dead datanodes is only for one block allocation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message