hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1056) Multi-node RPC deadlocks during block recovery
Date Mon, 22 Mar 2010 21:27:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848346#action_12848346
] 

Tsz Wo (Nicholas), SZE commented on HDFS-1056:
----------------------------------------------

> I think the solution may be to determine the "equality" of the DNs based on IP and ipcPort,
not by name (which is the xceiver port). There may be issues with this, though - have to think
through it more thoroughly.

Then, setting ipcPort (i.e. dfs.datanode.ipc.address) to 0.0.0.0:0 would result the same problem.
The question is: how should a Datanode be identified? It seems that we were using name and
storageID, where name = machineName + ":" + port.

> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
>                 Key: HDFS-1056
>                 URL: https://issues.apache.org/jira/browse/HDFS-1056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 cluster. I have
many concurrent writes on the cluster, and when I kill a DN, some percentage of the time I
get one of these cross-node deadlocks among 3 of the nodes (replication 3). All of the DN
RPC server threads are tied up waiting on RPC clients to other datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message