hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erik Steffl (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1072) AlreadyBeingCreatedException with HDFS_NameNode as the lease holder
Date Tue, 06 Apr 2010 21:05:33 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12854185#action_12854185
] 

Erik Steffl commented on HDFS-1072:
-----------------------------------

Further investigation revealed that the following sequence leads to AlreadyBeingCreatedException:

  - LEASE_LIMIT=500; cluster.setLeasePeriod(LEASE_LIMIT, LEASE_LIMIT);

  - thread A gets a lease on a file

  - thread B sleeps 2*soft limit

  - thread B tries to get lease on a file, triggers lease recovery and gets RecoveryInProgressException

  - before lease recovery ends, namenode LeaseManager.java:checkLeases finds out that hard
limit was also expired, start a new recovery, resets timeouts

  - thread B tries to get lease again, timeout is not expired (it was reset in previous step)
so it gets AlreadyBeingCreatedException

There are two problems in the code that lead to this:

  - hard limit should not be set to such a low value, it makes it very likely for recovery
to not finish before it's taken over by another recovery (because of expired hard limit)

  - namenode should recognize that even though limit is not expired the recovery is ongoing
and return RecoveryInProgressException instead of AlreadyBeingCreatedException (in FSNamesystem.java:startFileInternal,
when it's deciding what to do if the file is under construction)

> AlreadyBeingCreatedException with HDFS_NameNode as the lease holder
> -------------------------------------------------------------------
>
>                 Key: HDFS-1072
>                 URL: https://issues.apache.org/jira/browse/HDFS-1072
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.21.0
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: Erik Steffl
>             Fix For: 0.21.0
>
>
> TestReadWhileWriting may fail by AlreadyBeingCreatedException with HDFS_NameNode as the
lease holder, which indicates that lease recovery is in an inconsistent state.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message