hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11945) Internal lease recovery may not be retried for a long time
Date Wed, 07 Jun 2017 14:05:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16040927#comment-16040927

Kihwal Lee commented on HDFS-11945:

We could change the namenode lease holder ID every hour.  Normally there will be only a brief
moment of two being active in the system. Multiple ones can be active If there are failures.
If the ID is suffixed by time stamp or date string, the log message for recovery will show
how old the leases are.

The major cause of lease recovery failures is datanodes having problems during block recoveries.
One interesting case is when the namenode throws "server too busy" to datanodes. A {{commitBlockSynchronization()}}
call can fail for this reason and won't be retried.

> Internal lease recovery may not be retried for a long time
> ----------------------------------------------------------
>                 Key: HDFS-11945
>                 URL: https://issues.apache.org/jira/browse/HDFS-11945
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Kihwal Lee
> Lease is assigned per client who is identified by its holder ID or client ID, thus a
renewal or an expiration of a lease affects all files being written by the client.
> When a client/writer dies without closing a file, its lease expires in one hour (hard
limit) and the namenode tries to recover the lease. As a part of the process, the namenode
takes the ownership of the lease and renews it. If the recovery does not finish successfully,
the lease will expire in one hour and the namenode will try again to recover the lease.
> However, if a file system has another lease expiring within the hour, the recovery attempt
for the lease will push forward the expiration of the lease held by the namenode.  This causes
failed lease recoveries to be not retried for a long time. We have seen it happening for days.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message