hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3479) backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1
Date Wed, 30 May 2012 21:29:24 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13286080#comment-13286080
] 

Colin Patrick McCabe commented on HDFS-3479:
--------------------------------------------

Hi Nicholas,

In the patch it says "We don't check the last two megabytes of the edit log, in case the NameNode
crashed while writing to the edit log."

Basically, if we crash while writing to the end of the log, the underlying filesystem does
not give us the guarantees we would need to get every byte perfect.  Consider the following
sequence of events:

1. NN allocates an extra 1 MB at the end of the file and fills it with 0xff bytes
2. NN writes an opcode to the edit log file.  It happens to span two sectors on the hard disk
3. The kernel writes the second half of the opcode to disk
4. system crash

In this case, we're left with a file that looks like this:
{code}
0xff 0xff 0xff 0xff ... [opcode bytes]... 0xff 0xff 0xff
{code}

This would clearly fail validation.  Hence the NameNode would fail to start, even though no
data has been lost (the opcode was never acked to the client).  This would be a serious problem.
 UNCHECKED_REGION_LENGTH fixes this problem.

We can't control the order in which the kernel flushes sectors out of the buffer cache and
on to the hard disk.  We can set up barriers (that is what fsync is), but control of the ordering
is beyond us.
                
> backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-3479
>                 URL: https://issues.apache.org/jira/browse/HDFS-3479
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 1.0.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3335-b1.005.patch
>
>
> backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message