hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3335) check for edit log corruption at the end of the log
Date Fri, 11 May 2012 00:13:55 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3335?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13272929#comment-13272929
] 

Todd Lipcon commented on HDFS-3335:
-----------------------------------

In {{EditLogFileInputStream.nextOp}}, we should log a WARN message with the file name and
data on how many bytes are skipped at the end of the file. This way, if there is an error
replaying later, you might notice that in fact you did want to recover some of these edits.
Having the warning in the log will make it easier to find where they went.

In this place, it would also be nice to detect how many of those bytes were just 0xffffffff
padding vs data that potentially looks like transactions.

----

- Rename {{GarbageAfterTerminatorException.getOffset}} to something a little more clear --
right now it's not obvious that this is a relative offset/length after the OP_INVALID, versus
an offset since the beginning of the file, etc. Perhaps {{getPaddingLengthAfterEofMarker}}?
I'm still not entirely clear what this length represents... by my reading of the javadoc,
it is:

{code}
<--- valid edits ---> < OP_INVALID > <-- N bytes of padding --> <-- non-padding
data --> EOF
{code}
where {{N}} above is what you're talking about?

Maybe some ASCII art like the above in the javadoc would be helpful.

Part of what is confusing me is this: does padding after OP_INVALID count as garbage or not?

----

{code}
+  /** Testing hook */
+  void setEditLog(FSEditLog newLog) {
{code}

Can you add @VisibleForTesting and change to {{setEditLogForTesting}} so no one starts to
use it in non-test code?

----

- Lots of spurious whitespace changes in TestNameNodeRecovery
- Can you add brief javadoc to the three implementations of Corruptor? eg "/** Truncate the
last byte of the file */", "/* Add padding followed by some non-padding bytes to the end of
the file */" and "/** Add only padding to the end of the file */"?

Otherwise really nice tests.

                
> check for edit log corruption at the end of the log
> ---------------------------------------------------
>
>                 Key: HDFS-3335
>                 URL: https://issues.apache.org/jira/browse/HDFS-3335
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-3335-b1.001.patch, HDFS-3335-b1.002.patch, HDFS-3335-b1.003.patch,
HDFS-3335-b1.004.patch, HDFS-3335.001.patch, HDFS-3335.002.patch, HDFS-3335.003.patch, HDFS-3335.004.patch,
HDFS-3335.005.patch, HDFS-3335.006.patch, HDFS-3335.007.patch
>
>
> Even after encountering an OP_INVALID, we should check the end of the edit log to make
sure that it contains no more edits.
> This will catch things like rare race conditions or log corruptions that would otherwise
remain undetected.  They will got from being silent data loss scenarios to being cases that
we can detect and fix.
> Using recovery mode, we can choose to ignore the end of the log if necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message