hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron T. Myers (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1378) Edit log replay should track and report file offsets in case of errors
Date Tue, 05 Apr 2011 23:00:06 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016170#comment-13016170
] 

Aaron T. Myers commented on HDFS-1378:
--------------------------------------

Patch looks pretty solid, Todd, and very helpful. One comment:

There are large classes of edits log corruptions which will result in some exception which
is not an IOE being thrown. But, this debugging info is only printed in the event an IOE is
thrown. I've twice now had to change this code to catch NPE and recompile to get it to print
this info. Ideally I think we'd change things so that this stuff is in a "{{catch (Throwable
t)}}" block, with the actual exception being re-thrown after printing.

> Edit log replay should track and report file offsets in case of errors
> ----------------------------------------------------------------------
>
>                 Key: HDFS-1378
>                 URL: https://issues.apache.org/jira/browse/HDFS-1378
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: name-node
>    Affects Versions: 0.22.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>         Attachments: hdfs-1378-branch20.txt
>
>
> Occasionally there are bugs or operational mistakes that result in corrupt edit logs
which I end up having to repair by hand. In these cases it would be very handy to have the
error message also print out the file offsets of the last several edit log opcodes so it's
easier to find the right place to edit in the OP_INVALID marker. We could also use this facility
to provide a rough estimate of how far along edit log replay the NN is during startup (handy
when a 2NN has died and replay takes a while)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message