hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinayakumar B (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10220) Namenode failover due to too long loking in LeaseManager.Monitor
Date Thu, 31 Mar 2016 09:44:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10220?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219677#comment-15219677
] 

Vinayakumar B commented on HDFS-10220:
--------------------------------------

{code}
+  public static final String  DFS_NAMENODE_MAX_RELEASED_LEASES_PER_ITERATION_KEY = "dfs.namenode.lease-manager.max-released-leases-per-iteration";
+  public static final long    DFS_NAMENODE_MAX_RELEASED_LEASES_PER_ITERATION_DEFAULT = 100000;
{code}
I think prop name could be "dfs.namenode.lease-manager.max-files-checked-per-iteration", as
all might not be released immediately. For some block recovery will be triggered.
And I think 100000, default value is quite high. Since these are recovery of unclosed files
from dead clients from NameNode explicitly, No need to hurry. 
I feel 1000 would be enough. Similar to BLOCK_DELETION_INCREMENT for removal of large number
of blocks.


{code}
+  LeaseManager(FSNamesystem fsnamesystem) {
+    this.fsnamesystem = fsnamesystem;
+    Configuration conf = new Configuration();
+    this.maxFilesCheckedForRelease = conf.getLong(DFS_NAMENODE_MAX_RELEASED_LEASES_PER_ITERATION_KEY,
+      DFS_NAMENODE_MAX_RELEASED_LEASES_PER_ITERATION_DEFAULT);
+  }
{code}
Creating new Configuration instance, will always load configurations from xml files. This
would make it impossible to change values on the fly for tests. It would be better to pass
Configuration from FSNamesystem during creation of LeaseManager.

{code}+  @VisibleForTesting
+  public void setMaxFilesCheckedForRelease(long maxFilesCheckedForRelease) {
+    this.maxFilesCheckedForRelease = maxFilesCheckedForRelease;
+  }{code}
When the conf is passed from FSNamesystem, this may not be required. Test can change the configuration
itself.


{code}
+      long numLeasesPathsReleased = 0;
.
.
.
+      numLeasesPathsReleased += numLeasePaths;
+      //Relinquish FSNamesystem lock after maxPathRealeaseExpiredLease iterations
+      if (numLeasesPathsReleased >= maxFilesCheckedForRelease) {
+        LOG.warn("Breaking out of checkLeases() after " + numLeasesPathsReleased + " iterations");
+        break;
+      }
{code}
This will not ensure that loop will come out exactly after the max number of leases released.
If one client itself have more number of open files, then this may not be correct.
It can be modified as below.
{code}
@@ -356,7 +363,9 @@ synchronized boolean checkLeases() {
     boolean needSync = false;
     assert fsnamesystem.hasWriteLock();
 
-    while(!sortedLeases.isEmpty() && sortedLeases.peek().expiredHardLimit()) {
+    int filesLeasesChecked = 0;
+    while (!sortedLeases.isEmpty() && sortedLeases.peek().expiredHardLimit()
+        && filesLeasesChecked < maxFilesCheckedForLeaseRecovery) {
       Lease leaseToCheck = sortedLeases.poll();
       LOG.info(leaseToCheck + " has expired hard limit");
 
@@ -396,6 +405,14 @@ synchronized boolean checkLeases() {
           LOG.error("Cannot release the path " + p + " in the lease "
               + leaseToCheck, e);
           removing.add(id);
+        } finally{
+          filesLeasesChecked++;
+          if (filesLeasesChecked >= maxFilesCheckedForLeaseRecovery) {
+            LOG.warn(
+                "Breaking out of checkLeases() after " + filesLeasesChecked
+                    + " file leases checked.");
+            break;
+          }
         }
       }
{code}

> Namenode failover due to too long loking in LeaseManager.Monitor
> ----------------------------------------------------------------
>
>                 Key: HDFS-10220
>                 URL: https://issues.apache.org/jira/browse/HDFS-10220
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Nicolas Fraison
>            Priority: Minor
>         Attachments: HADOOP-10220.001.patch, HADOOP-10220.002.patch, threaddump_zkfc.txt
>
>
> I have faced a namenode failover due to unresponsive namenode detected by the zkfc with
lot's of WARN messages (5 millions) like this one:
> _org.apache.hadoop.hdfs.StateChange: BLOCK* internalReleaseLease: All existing blocks
are COMPLETE, lease removed, file closed._
> On the threaddump taken by the zkfc there are lots of thread blocked due to a lock.
> Looking at the code, there are a lock taken by the LeaseManager.Monitor when some lease
must be released. Due to the really big number of lease to be released the namenode has taken
too many times to release them blocking all other tasks and making the zkfc thinking that
the namenode was not available/stuck.
> The idea of this patch is to limit the number of leased released each time we check for
lease so the lock won't be taken for a too long time period.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message