hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3032) Lease renewer tries forever even if renewal is not possible
Date Mon, 05 Mar 2012 19:53:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222556#comment-13222556
] 

Tsz Wo (Nicholas), SZE commented on HDFS-3032:
----------------------------------------------

Hi Kihwal, I think we may simply change LeaseRenewer to retry up to a time limit as below.
 I made the limit to 2*HdfsConstants.LEASE_SOFTLIMIT_PERIOD since HdfsConstants.LEASE_SOFTLIMIT_PERIOD
is only one minute.  What do you think?
{code}
Index: hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/LeaseRenewer.java
===================================================================
--- hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/LeaseRenewer.java
(revision 1297199)
+++ hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/LeaseRenewer.java
(working copy)
@@ -430,7 +430,8 @@
     for(long lastRenewed = System.currentTimeMillis();
         clientsRunning() && !Thread.interrupted();
         Thread.sleep(getSleepPeriod())) {
-      if (System.currentTimeMillis() - lastRenewed >= getRenewalTime()) {
+      final long diff = System.currentTimeMillis() - lastRenewed;
+      if (diff >= getRenewalTime()) {
         try {
           renew();
           if (LOG.isDebugEnabled()) {
@@ -438,19 +439,19 @@
                 + " with renew id " + id + " executed");
           }
           lastRenewed = System.currentTimeMillis();
-        } catch (SocketTimeoutException ie) {
+        } catch (IOException ie) {
+          final boolean abort = diff > 2*HdfsConstants.LEASE_SOFTLIMIT_PERIOD;
           LOG.warn("Failed to renew lease for " + clientsString() + " for "
-              + (getRenewalTime()/1000) + " seconds.  Aborting ...", ie);
-          synchronized (this) {
-            for(DFSClient c : dfsclients) {
-              c.abort();
+              + (getRenewalTime()/1000) + " seconds.  "
+              + (abort? "Aborting ...": "Will retry shortly ..."), ie);
+          if (abort) {
+            synchronized (this) {
+              for(DFSClient c : dfsclients) {
+                c.abort();
+              }
             }
+            break;
           }
-          break;
-        } catch (IOException ie) {
-          LOG.warn("Failed to renew lease for " + clientsString() + " for "
-              + (getRenewalTime()/1000) + " seconds.  Will retry shortly ...",
-              ie);
         }
       }
{code}

                
> Lease renewer tries forever even if renewal is not possible
> -----------------------------------------------------------
>
>                 Key: HDFS-3032
>                 URL: https://issues.apache.org/jira/browse/HDFS-3032
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.23.0, 0.24.0, 0.23.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 0.23.2, 0.23.3
>
>         Attachments: hdfs-3032.patch.txt
>
>
> When LeaseRenewer gets an IOException while attempting to renew for a client, it retries
after sleeping 500ms. If the exception is caused by a condition that will never change, it
keeps talking to the name node until the DFSClient object is closed or aborted.  With the
FileSystem cache, a DFSClient can stay alive for very long time. We've seen the cases in which
node managers and long living jobs flooding name node with this type of calls.
> The current proposal is to abort the client when RemoteException is caught during renewal.
LeaseRenewer already does abort on all clients when it sees a SocketTimeoutException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message