hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zesheng Wu (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-4882) Namenode checkLeases() runs into infinite loop
Date Wed, 05 Jun 2013 11:29:20 GMT
Zesheng Wu created HDFS-4882:
--------------------------------

             Summary: Namenode checkLeases() runs into infinite loop
                 Key: HDFS-4882
                 URL: https://issues.apache.org/jira/browse/HDFS-4882
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client, namenode
    Affects Versions: 2.0.0-alpha
            Reporter: Zesheng Wu


Scenario:
1. cluster with 4 DNs
2. the size of the file to be written is a little more than one block
3. write the first block to 3 DNs, DN1->DN2->DN3
4. all the data packets of first block is successfully acked and the client sets the pipeline
stage to PIPELINE_CLOSE, but the last packet isn't sent out
5. DN2 and DN3 are down
6. client recovers the pipeline, but no new DN is added to the pipeline because of the current
pipeline stage is PIPELINE_CLOSE
7. client continuously writes the last block, and try to close the file after written all
the data
8. NN finds that the penultimate block doesn't has enough replica(our dfs.namenode.replication.min=2),
and the client's close runs into indefinite loop(HDFS-2936), and at the same time, NN makes
the last block's state to COMPLETE
9. shutdown the client
10. the file's lease exceeds hard limit
11. LeaseManager realizes that and begin to do lease recovery by call fsnamesystem.internalReleaseLease()
12. but the last block's state is COMPLETE, and this triggers lease manager's infinite loop
and prints massive logs like this:
{noformat}
2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Lease [Lease.
 Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1] has expired hard
 limit
2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering
lease=[Lease.  Holder: DFSClient_NONMAPREDUCE_-1252656407_1, pendingcreates: 1], src=
/user/h_wuzesheng/test.dat
2013-06-05,17:42:25,695 WARN org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease:
File = /user/h_wuzesheng/test.dat, block blk_-7028017402720175688_1202597,
lastBLockState=COMPLETE
2013-06-05,17:42:25,695 INFO org.apache.hadoop.hdfs.server.namenode.LeaseManager: Started
block recovery for file /user/h_wuzesheng/test.dat lease [Lease.  Holder: DFSClient_NONM
APREDUCE_-1252656407_1, pendingcreates: 1]
{noformat}
(the 3rd line log is a debug log added by us)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message