hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1056) Multi-node RPC deadlocks during block recovery
Date Mon, 22 Mar 2010 21:43:27 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12848369#action_12848369
] 

Todd Lipcon commented on HDFS-1056:
-----------------------------------

FWIW I made the suggested fix (comparing based on host and ipcPort) on the cluster in question
and the problem went away. It may cause some other problem, though. I'll think about it a
bit and post a patch and test case soon.

> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
>                 Key: HDFS-1056
>                 URL: https://issues.apache.org/jira/browse/HDFS-1056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 cluster. I have
many concurrent writes on the cluster, and when I kill a DN, some percentage of the time I
get one of these cross-node deadlocks among 3 of the nodes (replication 3). All of the DN
RPC server threads are tied up waiting on RPC clients to other datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message