hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5790) LeaseManager.findPath is very slow when many leases need recovery
Date Fri, 17 Jan 2014 00:39:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13874219#comment-13874219
] 

Todd Lipcon commented on HDFS-5790:
-----------------------------------

As a quick check of the above, I did the following patch:

{code}
diff --git a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
b/hadoop-hdfs-p
index 8b5fb81..62e60da 100644
--- a/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/namenode/LeaseManager.java
@@ -40,6 +40,7 @@
 import org.apache.hadoop.util.Daemon;
 
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
 import com.google.common.base.Preconditions;
 
 /**
@@ -256,6 +257,16 @@ public boolean expiredSoftLimit() {
      * @return the path associated with the pendingFile and null if not found.
      */
     private String findPath(INodeFile pendingFile) {
+      String retOrig = findPathOrig(pendingFile);
+      String retNew = pendingFile.getFullPathName();
+      if (!Objects.equal(retOrig, retNew)) {
+        throw new AssertionError("orig implementation found: " + retOrig +
+                                 " new implementation found: " + retNew);
+      }
+      return retNew;
+    }
+
+    private String findPathOrig(INodeFile pendingFile) {
       try {
         for (String src : paths) {
           INode node = fsnamesystem.dir.getINode(src);
{code}

that is to say, I try the suggested optimization, along with the original implementation,
and verify that they return the same results. I ran all the HDFS tests and they all passed,
indicating that it's likely this optimization wouldn't break anything. And, it should be much
faster, since it's O(directory depth) instead of O(number of leases held by the client * those
lease's directory depths).

Anyone have opinions on this? [~kihwal] or [~daryn] maybe? (seem to recall both of you working
in this area a few months back)

> LeaseManager.findPath is very slow when many leases need recovery
> -----------------------------------------------------------------
>
>                 Key: HDFS-5790
>                 URL: https://issues.apache.org/jira/browse/HDFS-5790
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode, performance
>    Affects Versions: 2.4.0
>            Reporter: Todd Lipcon
>
> We recently saw an issue where the NN restarted while tens of thousands of files were
open. The NN then ended up spending multiple seconds for each commitBlockSynchronization()
call, spending most of its time inside LeaseManager.findPath(). findPath currently works by
looping over all files held for a given writer, and traversing the filesystem for each one.
This takes way too long when tens of thousands of files are open by a single writer.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message