Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Wed, 16 May 2012 18:21:08 +0000 (UTC)
From: "Eli Collins (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: 
 <317653342.5062.1337192468378.JavaMail.tomcat@hel.zones.apache.org>
Subject: [jira] [Updated] (HDFS-1056) Multi-node RPC deadlocks during block
 recovery
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eli Collins updated HDFS-1056:
------------------------------

    Target Version/s: 2.0.1

Marking for 2.x, verified trunk still needs this fix. 
                
> Multi-node RPC deadlocks during block recovery
> ----------------------------------------------
>
>                 Key: HDFS-1056
>                 URL: https://issues.apache.org/jira/browse/HDFS-1056
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: data-node
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Todd Lipcon
>             Fix For: 0.20-append
>
>         Attachments: 0013-HDFS-1056.-Fix-possible-multinode-deadlocks-during-b.patch
>
>
> Believe it or not, I'm seeing HADOOP-3657 / HADOOP-3673 in a 5-node 0.20 cluster. I have many concurrent writes on the cluster, and when I kill a DN, some percentage of the time I get one of these cross-node deadlocks among 3 of the nodes (replication 3). All of the DN RPC server threads are tied up waiting on RPC clients to other datanodes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira