hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3876) NN should not RPC to self to find trash defaults (causes deadlock)
Date Tue, 04 Sep 2012 17:26:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13447830#comment-13447830

Todd Lipcon commented on HDFS-3876:

- When you clobber the trash interval in the configuration, you sohuld do it on a copy, rather
than modifying the config that the user passed in.

+      // If we can not determine that trash is enabled server side then
+      // bail rather than potentially deleting a file when trash is enabled.
+      System.err.println("Failed to determine server trash configuration: "
+          + e.getMessage());
+      return false;

This doesn't seem to be what happens. See the TODO in {{Delete.java}}:
      // TODO: if the user wants the trash to be used but there is any
      // problem (ie. creating the trash dir, moving the item to be deleted,
      // etc), then the path will just be deleted because moveToTrash returns
      // false and it falls thru to fs.delete.  this doesn't seem right

if you actually want it to bail, it should probably throw an exception -- or return false
but remove this comment, and separately address the TODO mentioned above.

+    Configuration clientConf = new Configuration(serverConf);
+    if (clientTrash) {
+      clientConf.setLong(FS_TRASH_INTERVAL_KEY, 200);
+    }

Since you instantiate the client conf from {{serverConf}}, won't you end up with client trash
enabled even if {{clientTrash}} is false?
> NN should not RPC to self to find trash defaults (causes deadlock)
> ------------------------------------------------------------------
>                 Key: HDFS-3876
>                 URL: https://issues.apache.org/jira/browse/HDFS-3876
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 2.2.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Eli Collins
>            Priority: Blocker
>         Attachments: hdfs-3876.txt, hdfs-3876.txt, hdfs-3876.txt
> When transitioning a SBN to active, I ran into the following situation:
> - the TrashPolicy first gets loaded by an IPC Server Handler thread. The {{initialize}}
function then tries to make an RPC to the same node to find out the defaults.
> - This is happening inside the NN write lock (since it's part of the active initialization).
Hence, all of the other handler threads are already blocked waiting to get the NN lock.
> - Since no handler threads are free, the RPC blocks forever and the NN never enters active
> We need to have a general policy that the NN should never make RPCs to itself for any
reason, due to potential for deadlocks like this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message