hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-988) saveNamespace can corrupt edits log
Date Wed, 28 Apr 2010 07:36:35 GMT

     [ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Todd Lipcon updated HDFS-988:

    Attachment: hdfs-988.txt

Attaching an updated patch for trunk. Additions:
- commitBlockSynchronization no longer allowed in safemode. Konstantin, do you still prefer
that we open a new issue for this? Dhruba seems to agree that it should be disallowed.
- startCheckpoint, endCheckpoint, and updatePipeline also check safemode now
- the new delegation token logging methods check safemode as well.

Some questions for review:
- Will logUpdateMasterKey be OK with the SafeModeException?
- Are there some asserts we could add to make it easier to catch these bugs in the future?
For example, we could assert !namesystem.isInSafeMode() in FSEditLog.logSync(). Then if we
ran assertions on unit tests we'd probably notice if we were accidentally making edits while
in safe mode.

Some responses to review above:

bq. FSNamesystem.getAdditionalBlock() checking isInSafeMode() should be before calling chooseTargets().
I would not change getAdditionalBlock() at all.

Right now, getAdditionalBlock is split into two synchronized blocks. The safe mode status
could switch between the two. Are you suggesting that we check safemode in both, or we combine
the blocks into one? I assumed the intent was to avoid doing the potentially CPU-heavy chooseTarget
work while synchronized.

bq. renewLease() shouldn't be under FSNamesystem lock? leaseManeger has its own lock

This is to prevent safemode from switching while calling renewLease. If we decided that renewing
a lease under safemode is not allowed, then we need to synchronize here. Otherwise the check
is prone to races.

Regarding deadlock potential, I think we're safe since LeaseManager.Monitor synchronizes on
FSNamesystem before synchronizing on the lease manager.

bq. Your changes to permission methods incorporate HDFS-133.

Resolved that one as dup, thanks.

> saveNamespace can corrupt edits log
> -----------------------------------
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Todd Lipcon
>         Attachments: hdfs-988.txt, saveNamespace.txt
> The adminstrator puts the namenode is safemode and then issues the savenamespace command.
This can corrupt the edits log. The problem is that  when the NN enters safemode, there could
still be pending logSycs occuring from other threads. Now, the saveNamespace command, when
executed, would save a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message