hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1149) Lease reassignment is not persisted to edit log
Date Thu, 02 Jun 2011 02:36:47 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042578#comment-13042578
] 

Todd Lipcon commented on HDFS-1149:
-----------------------------------

A few nits:

- for DataNode.setHeartbeatsEnabled, I think it would be better to make it package-private,
and then bounce through the "DataNodeAdapter" class to get at it. I also think it would be
clearer if we inverted its meaning and renamed it to {{heartbeatsDisabledForTests}} - that
way when reading the code later it will be clear that this is always false in normal operation.
- Same goes for all of the new public members in LeaseManager/Lease -- I think you can just
move the getLeaseByPath function into NameNodeAdapter, then it can all stay package-protected,
right?
- In the test case, I think it's better to call {{stm.hflush()}} after the writer has lost
its lease -- this is a DN-only operation, which means that it's verifying that the lease recovery
has gone all the way through, not just a NN state change. The fact that you check isUnderConstruction
should already do that as well, but just a double-check. Then you can close the stream as
well and check for the same exception.
- I think the new NAMENODE_LEASE_MANAGER_SLEEP_TIME is probably better named NAMENODE_LEASE_RECHECK_INTERVAL
(more consistent with other variables like {{heartbeatRecheckInterval}} and {{replicationRecheckInterval}})

Other concern:
- Does this interact correctly with lease maintenance on rename/delete? I think so... but
it would be good to add the following tests:

Test A:
1) client creates file /dir_a/file and leaves it open
2) client renames /dir_a to /dir_b   (this calls LeaseManager.changeLease)
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: NN_Recovery should own the lease on the new location of the
file

[ this tests that on edit log replay, the lease is properly tracked to the new name of the
file ]

Test B:
1) client creates file /file and leaves it open
2) client deletes file /file
3) client dies, so lease recovery happens
4) NN reassigns lease to NN_Recovery
5) NN restarts and loads edits: no NPEs or anything


I'm also wondering if we have an issue with regards to safeMode. In theory we should never
write anything to the edit log while in safemode, but I don't see safemode checks in internalReleaseLease.
This is similar to the bugs seen in HDFS-988 if you want some background


> Lease reassignment is not persisted to edit log
> -----------------------------------------------
>
>                 Key: HDFS-1149
>                 URL: https://issues.apache.org/jira/browse/HDFS-1149
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.22.0, 0.23.0
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>             Fix For: 0.23.0
>
>         Attachments: hdfs-1149.0.patch
>
>
> During lease recovery, the lease gets reassigned to a special NN holder. This is not
currently persisted to the edit log, which means that after an NN restart, the original leaseholder
could end up allocating more blocks or completing a file that has already started recovery.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message