hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-909) Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations corrupts edits log
Date Mon, 08 Feb 2010 23:59:28 GMT

    [ https://issues.apache.org/jira/browse/HDFS-909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12831200#action_12831200

Todd Lipcon commented on HDFS-909:

Unfortunately, I wasn't able to write a unit test that reproduces your race. This is because
the race only
occurs if the NN exits before rollEditLogs() is called at the end of FSImage.saveFSImage(boolean)
the race induces corruiption in EDITS, but EDITS_NEW is a correct empty file. rollEditLogs
thus fixes
up the state of the file.

I think we'll deal with this issue in the other JIRA regarding saveFSImage operation.

bq. think we will need to call waitForSyncToFinish() both before entering safe mode

I think it's actually impossible to be correct here. The issue with waitForSyncToFinish in
enterSafeMode is that many
of the FSNamesystem calls have a structure that looks like:
1 void someOperation() {
2   synchronized (this) {
3     if (!isInSafeMode()) { explode; }
4     internalSomeOperation();
5   }
6   getEditLog().logSync();
7 }

If we call enterSafeMode between line 5 and 6, a waitForSyncToFinish would return immediately,
since the sync isn't running yet. Really this is the case for any of lines 3-6 since enterSafeMode
is synchronized as well.

I think we need an additional method with stronger guarantees than waitForSyncToFinish - something
syncAllOutstandingOperations that waits until synctxid == txid.

What do you think?

> Race condition between rollEditLog or rollFSImage ant FSEditsLog.write operations  corrupts
edits log
> -----------------------------------------------------------------------------------------------------
>                 Key: HDFS-909
>                 URL: https://issues.apache.org/jira/browse/HDFS-909
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.20.1, 0.20.2, 0.21.0, 0.22.0
>         Environment: CentOS
>            Reporter: Cosmin Lehene
>            Assignee: Todd Lipcon
>            Priority: Blocker
>             Fix For: 0.21.0, 0.22.0
>         Attachments: hdfs-909-unittest.txt, hdfs-909.txt, hdfs-909.txt, hdfs-909.txt
> closing the edits log file can race with write to edits log file operation resulting
in OP_INVALID end-of-file marker being initially overwritten by the concurrent (in setReadyToFlush)
threads and then removed twice from the buffer, losing a good byte from edits log.
> Example:
> {code}
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> FSEditLog.closeStream()
-> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.rollEditLog() -> FSEditLog.divertFileStreams() -> FSEditLog.closeStream()
-> EditLogOutputStream.flush() -> EditLogFileOutputStream.flushAndSync()
> OR
> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> FSEditLog.purgeEditLog()
-> FSEditLog.revertFileStreams() -> FSEditLog.closeStream() ->EditLogOutputStream.setReadyToFlush()

> FSNameSystem.rollFSImage() -> FSIMage.rollFSImage() -> FSEditLog.purgeEditLog()
-> FSEditLog.revertFileStreams() -> FSEditLog.closeStream() ->EditLogOutputStream.flush()
-> EditLogFileOutputStream.flushAndSync()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> EditLogOutputStream.setReadyToFlush()
> FSNameSystem.completeFile -> FSEditLog.logSync() -> EditLogOutputStream.flush()
-> EditLogFileOutputStream.flushAndSync()
> OR 
> Any FSEditLog.write
> {code}
> Access on the edits flush operations is synchronized only in the FSEdits.logSync() method
level. However at a lower level access to EditsLogOutputStream setReadyToFlush(), flush()
or flushAndSync() is NOT synchronized. These can be called from concurrent threads like in
the example above
> So if a rollEditLog or rollFSIMage is happening at the same time with a write operation
it can race for EditLogFileOutputStream.setReadyToFlush that will overwrite the the last byte
(normally the FSEditsLog.OP_INVALID which is the "end-of-file marker") and then remove it
twice (from each thread) in flushAndSync()! Hence there will be a valid byte missing from
the edits log that leads to a SecondaryNameNode silent failure and a full HDFS failure upon
cluster restart. 
> We got to this point after investigating a corrupted edits file that made HDFS unable
to start with 
> {code:title=namenode.log}
> java.io.IOException: Incorrect data format. logVersion is -20 but writables.length is
>         at org.apache.hadoop.hdfs.server.namenode.FSEditLog.loadEditRecords(FSEditLog.java:450
> {code}
> EDIT: moved the logs to a comment to make this readable

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message