hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cosmin Lehene (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-630) In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes when locating the next block.
Date Fri, 13 Nov 2009 13:44:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777500#action_12777500

Cosmin Lehene commented on HDFS-630:

stack: I can't reproduce it on 0.21. I did find it in the NN log before upgrading the HBase
jar to the patched hdfs. 

java.io.IOException: Cannot complete block: block has not been COMMITTED by the client
        at org.apache.hadoop.hdfs.server.namenode.BlockInfoUnderConstruction.convertToCompleteBlock(BlockInfoUnderConstruction.java:158)
        at org.apache.hadoop.hdfs.server.namenode.BlockManager.completeBlock(BlockManager.java:288)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1243)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:637)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)
        at sun.reflect.GeneratedMethodAccessor48.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:516)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:964)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:960)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:396)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:958)

I should point that 
 at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:621)

line 621 in the NameNode means it was called from an unpached DFSClient that calls the old
NameNode interface
line 621: return addBlock(src, clientName, null, null); 

This is part of  public LocatedBlock addBlock(String src, String clientName, Block previous)

  public LocatedBlock addBlock(String src, String clientName,
                               Block previous)
    throws IOException {
    return addBlock(src, clientName, null, null);

This is different than your stacktrace http://pastie.org/695936 that calls the complete()

However could you search for the same error while adding a new block with addBlock() (like
mine)? If you find it, you could figure out what's the entry point in NameNode, and if it's
line 621 you might have a an unpatched DFSClient. 

However, even with an unpatched DFSClient I still fail, yet, to figure out why would it cause
it. Perhaps I should get a better understanding of the cause of the exception. So far, from
the code comments in BlockInfoUnderConstruction I have that
"the state of the block  (the generation stamp and the length) has not been committed by the
client or it does not have at least a minimal number of replicas reported from data-nodes.

> In DFSOutputStream.nextBlockOutputStream(), the client can exclude specific datanodes
when locating the next block.
> -------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-630
>                 URL: https://issues.apache.org/jira/browse/HDFS-630
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs client
>    Affects Versions: 0.21.0
>            Reporter: Ruyue Ma
>            Assignee: Ruyue Ma
>            Priority: Minor
>             Fix For: 0.21.0
>         Attachments: 0001-Fix-HDFS-630-for-0.21.patch, HDFS-630.patch
> created from hdfs-200.
> If during a write, the dfsclient sees that a block replica location for a newly allocated
block is not-connectable, it re-requests the NN to get a fresh set of replica locations of
the block. It tries this dfs.client.block.write.retries times (default 3), sleeping 6 seconds
between each retry ( see DFSClient.nextBlockOutputStream).
> This setting works well when you have a reasonable size cluster; if u have few datanodes
in the cluster, every retry maybe pick the dead-datanode and the above logic bails out.
> Our solution: when getting block location from namenode, we give nn the excluded datanodes.
The list of dead datanodes is only for one block allocation.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message