hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
Date Sun, 02 Sep 2012 00:32:07 GMT

     [ https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Eli Collins updated HDFS-3876:
------------------------------

    Attachment: hdfs-3876.txt

TestViewFsTrash failed because the test deletes "/" and we now get server defaults on the
path (to get the trash configuration), which fails for viewfs for "/" because "/" is not associated
with a file system and we fail due to a NotInMoutPointException.

Updated patch, catch Exception rather than IOE when obtaining server defaults so we fail the
delete when we fail to get server defaults (rather than potentially ignoring the server trash
configuration for transient errors getting server defaults) and updated TestTrash to not fail
the test when we fail due to obtaining server defaults (which is what should happen in the
viewfs case).
                
> NN should not RPC to self to find trash defaults (causes deadlock)
> ------------------------------------------------------------------
>
>                 Key: HDFS-3876
>                 URL: https://issues.apache.org/jira/browse/HDFS-3876
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.2.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Eli Collins
>            Priority: Blocker
>         Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt
>
>
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}}
function then tries to make an RPC to the same node to find out the defaults.
> - This is happening inside the NN write lock (since it's part of the active initialization).
Hence, all of the other handler threads are already blocked waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never enters active
state.
> We need to have a general policy that the NN should never make RPCs to itself for any
reason, due to potential for deadlocks like this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message