hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3032) Lease renewer tries forever even if renewal is not possible
Date Tue, 06 Mar 2012 00:11:57 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13222810#comment-13222810
] 

Kihwal Lee commented on HDFS-3032:
----------------------------------

bq. I think it should retry even for RemoteException since RemoteException only indicates
the exception originated from the server side but it may still a transient problem, e.g. rpc
server is very busy and start dropping connection.

There are RemoteExceptions caused by permanent failure conditions. But since we are limiting
retries, I think it is safe to let the renewer retry as you suggested.
                
> Lease renewer tries forever even if renewal is not possible
> -----------------------------------------------------------
>
>                 Key: HDFS-3032
>                 URL: https://issues.apache.org/jira/browse/HDFS-3032
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client
>    Affects Versions: 0.23.0, 0.24.0, 0.23.1
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>             Fix For: 0.24.0, 0.23.2, 0.23.3
>
>         Attachments: hdfs-3032.patch.txt, hdfs-3032.patch.txt
>
>
> When LeaseRenewer gets an IOException while attempting to renew for a client, it retries
after sleeping 500ms. If the exception is caused by a condition that will never change, it
keeps talking to the name node until the DFSClient object is closed or aborted.  With the
FileSystem cache, a DFSClient can stay alive for very long time. We've seen the cases in which
node managers and long living jobs flooding name node with this type of calls.
> The current proposal is to abort the client when RemoteException is caught during renewal.
LeaseRenewer already does abort on all clients when it sees a SocketTimeoutException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message