Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 31 Aug 2012 10:25:08 +1100 (NCT)
From: "Todd Lipcon (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <888208615.19109.1346369108315.JavaMail.jiratomcat@arcas>
Subject: [jira] [Created] (HDFS-3876) NN should not RPC to self to find
 trash defaults (causes deadlock)
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Todd Lipcon created HDFS-3876:
---------------------------------

             Summary: NN should not RPC to self to find trash defaults (causes deadlock)
                 Key: HDFS-3876
                 URL: https://issues.apache.org/jira/browse/HDFS-3876
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: name-node
    Affects Versions: 3.0.0, 2.2.0-alpha
            Reporter: Todd Lipcon
            Priority: Blocker


When transitioning a SBN to active, I ran into the following situation:
- the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}} function then tries to make an RPC to the same node to find out the defaults.
- This is happening inside the NN write lock (since it's part of the active initialization). Hence, all of the other handler threads are already blocked waiting to get the NN lock.
- Since no handler threads are free, the RPC blocks forever and the NN never enters active state.

We need to have a general policy that the NN should never make RPCs to itself for any reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira