hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Karthik Ranganathan (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4662) Replay the required hlog edits to make the backup preserve row atomicity.
Date Mon, 27 Feb 2012 23:21:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13217682#comment-13217682

Karthik Ranganathan commented on HBASE-4662:

Missed this one:

<< How do you currently back up your HLogs? Do you have a process that watches .[old]logs
and copies/archives every new file appearing there? >>
We have written a taskframework (code name 'cassini'). The framework is logically the equivalent
of a distributed-threadpool. It manages N worker threads (one per regionserver) across M machines
(destination backup machines for example) using ZK as the persistent store for the queue of
tasks. It can run plugins that are coded to some requirements to do arbitrary work. That framework
has a plugin which we have implemented to tail and play logs. Will put that one out soon.

<< How do you back up the HFiles? Do you issue a flush before you do this?
That tool you mention in D. Is not completebulkload, right? Will that tool deal with replaying
the logs you placed in B.5.? >>
The above 2 are in the diff. Yes, we issue a flush, and there is a custom tool. HLog replays
are not done yet, we have an initial diff which we have not yet productized.

<< I found that distributed log splitting relies on region names in the HLog in order
to do the splitting. If any region splits happened after the HLog was written, or this is
a new table, the replay will fail for regions that do no longer exist. Do you plan to change
the distributed log splitter to deal with this? (It would need to map the rowkeys back to
the now-current set of regions.) >>

<< HLogs have entries of many tables. In the approach above whatever replays the log
would need to only replay those entries pertaining to the HFiles copied over, right? >>
Yes, and potentially take care of changed table names (export from table A, import as table
> Replay the required hlog edits to make the backup preserve row atomicity.
> -------------------------------------------------------------------------
>                 Key: HBASE-4662
>                 URL: https://issues.apache.org/jira/browse/HBASE-4662
>             Project: HBase
>          Issue Type: Sub-task
>          Components: documentation, regionserver
>            Reporter: Karthik Ranganathan
>            Assignee: Karthik Ranganathan
> The algorithm is as follows:
> A. For HFiles:
> 1. Need to track t1,t2 for each backup (start and end times of the backup)
> 2. For point in time restore to time t, pick a HFile snapshot which has t2 < t
> 3. Copy HFile snapshot to a temp location - HTABLE_RESTORE_t
> B. For HLogs:
> for each regionserver do
>   for .logs and .oldlogs do
> 1. log file is hlog.TIME
> 2. if (t > TIME and hlog.TIME is open for write) fail restore for t
> 3. Pick the latest HLog whose create time is < t1
> 4. Pick all HLogs whose create time is > t1 and <= t2
> 5. Copy hlogs to the right structures inside HTABLE_RESTORE_t
> C. Split logs
> 1. Enhance HLog.splitLog to take timestamp t
> 2. Enhance distributed log split tool to pass HTABLE_RESTORE_t, so that log split is
picked up and put in the right location
> 3. Enhance distributed log split tool to pass t so that all edits till t are included
and others ignored
> D. Import the directory into the running HBase with META entries, etc (this already exists)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message