hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Kellerman (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-698) HLog recovery is not performed after master failure
Date Mon, 10 Nov 2008 20:07:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12646344#action_12646344

Jim Kellerman commented on HBASE-698:

There is a very simple fix if the master comes back up and knows a region server is dead.

However, if the master dies, region servers hang around until the master comes back up. Thus
the master cannot know which HLogs to recover and which belong to running region servers.
("recovering" a HLog from a running region server would produce unpredictable results, most
likely leading to data corruption).

Relying on hdfs lease timeouts on the log files is also not an option as the lease timeout
interval is too long for this purpose.

The master can therefore not recover any region server logs unless it knows that region server
is dead.  This cannot be accomplished without Zookeeper integration, which will monitor the
region servers (and the regions they serve) using ephemeral files. At that point, if the master
dies and is restarted, it will know which region servers are alive, which ones have died and
all the regions that are currently being served. Then it will know which region server logs
to recover and which ones can be ignored (because the region server writing it is still alive).

> HLog recovery is not performed after master failure
> ---------------------------------------------------
>                 Key: HBASE-698
>                 URL: https://issues.apache.org/jira/browse/HBASE-698
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>          Components: master, regionserver
>    Affects Versions: 0.1.2
>            Reporter: Clint Morgan
>            Assignee: Jim Kellerman
>            Priority: Blocker
>             Fix For: 0.19.0
> I have a local cluster running, and its logging to
> <hbase>/log_X.X.X.X_1213228101021_60020/
> Then I kill both master and regionserver, and restart. Looking through
> the logs I don't see anything about trying to recover from this hlog,
> it just creates a new hlog alongside the existing one (with a new
> startcode).  The older hlog seems to be ignored, and the tables
> created in the inital session are all gone.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message