hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Nauroth (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4811) race condition between 2 namenodes in standby that are trying to checkpoint with one another can delete or corrupt a good fsimage
Date Thu, 09 May 2013 21:57:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13653281#comment-13653281
] 

Chris Nauroth commented on HDFS-4811:
-------------------------------------

[~tlipcon] and [~andrew.wang], yes, it sounds like it would.  Then, we wouldn't need the namesystem
lock.  Is there an existing jira to track that change?

I'd also suggest embedding the namenode ID in the temp file name to differentiate between
"this" NN's checkpoint and "peer" NN's checkpoint in the rare case of a timestamp collision.

On Windows, we may also want to consider a JNI hook for a native rename-replace-existing call,
using {{MoveFileEx}} with flag {{MOVEFILE_REPLACE_EXISTING}}.

http://msdn.microsoft.com/en-us/library/aa365240(v=vs.85).aspx

                
> race condition between 2 namenodes in standby that are trying to checkpoint with one
another can delete or corrupt a good fsimage
> ---------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-4811
>                 URL: https://issues.apache.org/jira/browse/HDFS-4811
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: ha
>    Affects Versions: 3.0.0, 2.0.5-beta
>            Reporter: Chris Nauroth
>
> The problem occurs under concurrent execution of the namenode running its own checkpoint
in {{StandbyCheckpointer}} in thread 1 while also getting a checkpoint from a different namenode
in {{GetImageServlet}} in thread 2.  It is possible for thread 2 to finish writing the checkpoint
to the directory, but then get suspended before it has a chance to rename it to its final
destination as an fsimage file.  Then, thread 1 wakes up and starts writing its own data to
the checkpoint file.  When thread 2 resumes, it then tries to rename the file that thread
1 still holds open for writing.  Depending on OS, this either moves thread 1's incomplete
checkpoint to fsimage, or it just outright deletes the existing good fsimage until thread
1 finishes writing and renames.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message