hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-2709) HA: Appropriately handle error conditions in EditLogTailer
Date Fri, 06 Jan 2012 01:35:39 GMT

    [ https://issues.apache.org/jira/browse/HDFS-2709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13181031#comment-13181031
] 

Todd Lipcon commented on HDFS-2709:
-----------------------------------

I'm skeptical of the fix -- the question is _why_ we see the wrong log version here. We investigated
and it looks like there's a race when a log file is created -- it preallocates the file with
all 0xFFFFFFFF, and then it goes back and writes the version number. Adding a sleep() after
the preallocate() call in EditLogFileOutputStream triggers this reliably. So, I think we should
file another JIRA to fix that race.

Separately, I agree that we should probably change this to be an exception instead of assert.
But I think LogHeaderCorruptException is probably a better choice.
                
> HA: Appropriately handle error conditions in EditLogTailer
> ----------------------------------------------------------
>
>                 Key: HDFS-2709
>                 URL: https://issues.apache.org/jira/browse/HDFS-2709
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ha, name-node
>    Affects Versions: HA branch (HDFS-1623)
>            Reporter: Todd Lipcon
>            Assignee: Aaron T. Myers
>            Priority: Critical
>         Attachments: HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch,
HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch, HDFS-2709-HDFS-1623.patch
>
>
> Currently if the edit log tailer experiences an error replaying edits in the middle of
a file, it will go back to retrying from the beginning of the file on the next tailing iteration.
This is incorrect since many of the edits will have already been replayed, and not all edits
are idempotent.
> Instead, we either need to (a) support reading from the middle of a finalized file (ie
skip those edits already applied), or (b) abort the standby if it hits an error while tailing.
If "a" isn't simple, let's do "b" for now and come back to 'a' later since this is a rare
circumstance and better to abort than be incorrect.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message