hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1056) Multi-node RPC deadlocks during block recovery
Date Mon, 23 Aug 2010 05:25:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901267#action_12901267
] 

Todd Lipcon commented on HDFS-1056:
-----------------------------------

bq. This fix could impact other code paths too, especially since the DN comparision is used
by many code paths. Maybe a unit test would be good.

Are you suggesting that the change be made to the equals() call instead of locally in the
DataNode code? As is, the patch Nicolas uploaded is scoped to just that bit of code where
it's been tested a lot and it's clear what the correct semantics are. I think changing equals()
itself would be dangerous as it might break things in FSNamesystem, replication policy, etc.

bq. also, does this problem exist in trunk?

Yep, it does - same fix applies

> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
>                 Key: HDFS-1056
>                 URL: https://issues.apache.org/jira/browse/HDFS-1056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.20-append
>
>         Attachments: 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch
>
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 cluster. I have
many concurrent writes on the cluster, and when I kill a DN, some percentage of the time I
get one of these cross-node deadlocks among 3 of the nodes (replication 3). All of the DN
RPC server threads are tied up waiting on RPC clients to other datanodes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message