hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
Date Fri, 31 Aug 2012 23:43:07 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13446495#comment-13446495

Todd Lipcon commented on HDFS-3876:

bq. Somewhat related: I believe that the trash emptier thread in the NN also makes RPCs to
the NN to delete the appropriate files, in addition to configuring itself. Should we do something
about that as well?

Would be a nice improvement but it can't cause a deadlock, since it's not being done from
an IPC handler thread. The issue here is that we have a handler thread making an IPC back
to itself. So it's possible that, when you don't have enough threads, the IPC blocks forever
because the client itself is holding the thread up.
> NN should not RPC to self to find trash defaults (causes deadlock)
> ------------------------------------------------------------------
>                 Key: HDFS-3876
>                 URL: https://issues.apache.org/jira/browse/HDFS-3876
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 3.0.0, 2.2.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Eli Collins
>            Priority: Blocker
>         Attachments: hdfs-3876.txt
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}}
function then tries to make an RPC to the same node to find out the defaults.
> - This is happening inside the NN write lock (since it's part of the active initialization).
Hence, all of the other handler threads are already blocked waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never enters active
> We need to have a general policy that the NN should never make RPCs to itself for any
reason, due to potential for deadlocks like this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message