hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Richard Lackey (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2643) Figure how to deal with eof splitting logs
Date Thu, 12 Aug 2010 11:21:16 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2643?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12897697#action_12897697

Richard Lackey commented on HBASE-2643:

There is an assumption that no RuntimeException will occur within splitLog. While such an
exception is remote, it is possible. Should a RuntimeException occur, then it will percolate
through HMaster, which will not join the cluster.
This is a general condition. The assumption in HDFS is that this cannot occur (or will be
caught by upper layer), which is to say that none of the lower layers catches Exception to
prevent the RuntimeException, e.g., NullPointerException, from percolating through. If the
SequenceFile contains garbage (or has been corrupted), then the opportunity for the underlying
DataInputStream to throw a RuntimeException increases.

The solution is to add a catch for Exception in splitLog and consider the log corrupt.

> Figure how to deal with eof splitting logs
> ------------------------------------------
>                 Key: HBASE-2643
>                 URL: https://issues.apache.org/jira/browse/HBASE-2643
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Priority: Blocker
>             Fix For: 0.90.0
> When splitting the WAL and encountering EOF, it's not clear what to do. Initial discussion
of this started in http://review.hbase.org/r/74/ - summarizing here for brevity:
> We can get an EOFException while splitting the WAL in the following cases:
> - The writer died after creating the file but before even writing the header (or crashed
halfway through writing the header)
> - The writer died in the middle of flushing some data - sync() guarantees that we can
see _at least_ the last edit, but we may see half of an edit that was being written out when
the RS crashed (especially for large rows)
> - The data was actually corrupted somehow (eg a length field got changed to be too long
and thus points past EOF)
> Ideally we would know when we see EOF whether it was really the last record, and in that
case, simply drop that record (it wasn't synced, so therefore we dont need to split it). Some
open questions:
>   - Currently we ignore empty files. Is it ok to ignore an empty log file if it's not
the last one?
>   - Similarly, do we ignore an EOF mid-record if it's not the last log file?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message