hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-10763) Open files can leak permanently due to inconsistent lease update
Date Mon, 15 Aug 2016 16:56:20 GMT
Kihwal Lee created HDFS-10763:

             Summary: Open files can leak permanently due to inconsistent lease update
                 Key: HDFS-10763
                 URL: https://issues.apache.org/jira/browse/HDFS-10763
             Project: Hadoop HDFS
          Issue Type: Bug
    Affects Versions: 2.6.4, 2.7.3
            Reporter: Kihwal Lee
            Priority: Critical

This can heppen during {{commitBlockSynchronization()}} or a client gives up on closing a
file after retries.
>From {{finalizeINodeFileUnderConstruction()}}, the lease is removed first and then the
inode is turned into the closed state. But if any block is not in COMPLETE state, 
{{INodeFile#assertAllBlocksComplete()}} will throw an exception. This will cause the lease
is removed from the lease manager, but not from the inode. Since the lease manager does not
have a lease for the file, no lease recovery will happen for this file. Moreover, this broken
state is persisted and reconstructed through saving and loading of fsimage. Since no replication
is scheduled for the blocks for the file, this can cause a data loss and also block decommissioning
of datanode.

The lease cannot be manually recovered either. It fails with
...AlreadyBeingCreatedException): Failed to RECOVER_LEASE /xyz/xyz for user1 on because the file is under construction but no leases found.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2950)

When a client retries {{close()}}, the same inconsistent state is created, but it can work
in the next time since {{checkLease()}} only looks at the inode, not the lease manager in
this case. The close behavior is different if HDFS-8999 is activated by setting {{dfs.namenode.file.close.num-committed-allowed}}
to 1 (unlikely) or 2 (never). 

In principle, the under-construction feature of an inode and the lease in the lease manager
should never go out of sync. The fix involves two parts.
1) Prevent inconsistent lease updates. We can achieve this by calling {{removeLease()}} after
checking the block state. 
2) Avoid reconstructing inconsistent lease states from a fsimage. 1) alone does not correct
the existing inconsistencies surviving through fsimages.  This can be done during fsimage
loading time by making sure a corresponding lease exists for each inode that are with the
underconstruction feature. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-dev-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-help@hadoop.apache.org

View raw message