hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eli Collins (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-988) saveNamespace can corrupt edits log, apparently due to race conditions
Date Thu, 02 Jun 2011 00:10:47 GMT

     [ https://issues.apache.org/jira/browse/HDFS-988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Eli Collins updated HDFS-988:

    Attachment: hdfs-988-5.patch

Thanks for taking a look Todd. Updated patch attached.

bq. checks for if (auditLog.isInfoEnabled()) should probably now be (auditLog.isInfoEnabled()
&& isExternalInvocation()) – otherwise we're doing a needless directory traversal
for fsck


bq. The following methods currently do logSync() while holding the writeLock, which is expensive:

Fixed. (Only one needed to conditionally call logSync)

bq. seems strange that some of the xInternal() methods take the write lock themselves (eg
setReplicationInternal) whereas others assume the caller takes the write lock (eg createSymlinkInternal).
We should be consistent.

Latest patch makes them more consistent, I also refactored out a couple new xInternal methods.
In a couple places (eg deleteInternal and getListing) I didn't hoist up the locking because
it would make the locking too coarse-grain (eg would result in syncing the log w/ the lock

bq. for those methods that don't explicitly take the write lock, we should either add an assert
hasWriteLock() or a comment explaining why the lock is not necessary (eg internalReleaseLease,
reassignLease, finalizeINodeFileUnderConstruction)

Done. For FSDirectory I made the unprotectedX methods actually unprotected and moved the locking
to the caller (except for FSEditLogLoader which calls the unprotected methods directly on
purpose - I doubt this really saves us that much). These methods (per their name) are now
intentionally unprotected. 

bq. comment for endCheckpoint says "not started" but should say "not ended".  same with updatePipeline.

Both fixed.

bq. why doesn't getListing need the read lock?

Because its callees (check*, getListing) take the lock.

bq. I noticed that nextGenerationStamp() doesn't logSync() – that seems dangerous, since
after a restart we might hand out a duplicate genstamp.

Good catch. I made sure all callers sync the log (this was only missing from the updateBlockForPipeline
path). nextGenerationStamp is always called with the lock held so I asserted that and removed
the lock aquisition from this method.

> saveNamespace can corrupt edits log, apparently due to race conditions
> ----------------------------------------------------------------------
>                 Key: HDFS-988
>                 URL: https://issues.apache.org/jira/browse/HDFS-988
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20-append, 0.21.0, 0.22.0
>            Reporter: dhruba borthakur
>            Assignee: Eli Collins
>            Priority: Blocker
>             Fix For: 0.20-append, 0.22.0
>         Attachments: HDFS-988_fix_synchs.patch, hdfs-988-2.patch, hdfs-988-3.patch, hdfs-988-4.patch,
hdfs-988-5.patch, hdfs-988.txt, saveNamespace.txt, saveNamespace_20-append.patch
> The adminstrator puts the namenode is safemode and then issues the savenamespace command.
This can corrupt the edits log. The problem is that  when the NN enters safemode, there could
still be pending logSycs occuring from other threads. Now, the saveNamespace command, when
executed, would save a edits log with partial writes. I have seen this happen on 0.20.
> https://issues.apache.org/jira/browse/HDFS-909?focusedCommentId=12828853&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12828853

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message