hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: Hooks for WAL archiving
Date Fri, 16 Sep 2011 17:31:33 GMT
On Thu, Sep 15, 2011 at 4:11 PM, lars hofhansl <lhofhansl@yahoo.com> wrote:
> A possible answer to keep all versions with no TTL, and do replication. At a certain
size this ceases to be practical though.

Discussing point-in-time-recovery here at our shop, and trying to
avoid having to keep all versions is what prompted the below issue:

 HBASE-4071  Data GC: Remove all versions > TTL EXCEPT the last
               written version (Lars Hofhansl)

You want to support being able to restore any version?

Our thought was that the TTL would be the window during which you
could get any version, a month say, and that thereafter, only the last
written would be kept.

> A typical scenario for relational databases is to take periodic base backups and also
archive the log files.
> Would that even work in HBase currently? Say I have distcp copy of all HBase files that
was done while HBase was running and I
> also have an archive of all WALs since the time when the distcp started.

So, you are thinking that you would replay all WALs from the cluster
from the point in time at which the hfile copy started?

That should work.

Would be nice if you could filter out complete WALs by looking at
"metadata", metadata that does not currently exist: e.g. metadata
could include what regions a WAL has edits for, the range of

Or, as in hbase-50, could roll logs first before staring the copy.
That'd narrow the number of WALs to replay for sure.

Would need a WAL to hfile mapreduce job.

I think the PITR would be easier if table-scoped.

Doing it cluster-wide would require our having the meta table in sync
as you say elsewhere.  Or, we just dump the state of meta when doing a
cluster backup at the end of PITR and restoring a cluster, the first
thing we'd do is replace .META. (Could be issue if tables deleted
between start of PITR and end).

> Could I theoretically restore HBase to a consistent state (at any time after the distcp
finished)? Or are there changes that are not
> WAL logged that I would miss (like admin actions)?

These are not logged currently but Dhruba just opened this:

 HBASE-4401 Record log region splits and region moves in the HLog


View raw message