hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From lars hofhansl <lhofha...@yahoo.com>
Subject Hooks for WAL archiving
Date Thu, 15 Sep 2011 23:11:01 GMT
I have been thinking about backup and point in time recovery (pitr) in HBase. This is mainly
needed in case of software errors,
or when a customer would ask us to restore some data they accidentally deleted.

A possible answer to keep all versions with no TTL, and do replication. At a certain size
this ceases to be practical though.

A typical scenario for relational databases is to take periodic base backups and also archive
the log files.
Would that even work in HBase currently? Say I have distcp copy of all HBase files that was
done while HBase was running and I
also have an archive of all WALs since the time when the distcp started.

Could I theoretically restore HBase to a consistent state (at any time after the distcp finished)?
Or are there changes that are not
WAL logged that I would miss (like admin actions)?

If that works, a backup would involve these steps:
1. Flush all stores.
2. copy the files.
3. roll all logs.

#1 and #3 are really optional, #3 is good because it would make all logs eligible for archiving
right after the backup is done.

In any case some hooks to act upon HLog actions would be a good thing anyway. For example
we could add four new methods to WALObserver (or a new observer type):

boolean preLogRoll(Path newFile)
void postLogRoll(Path newFile)

boolean preLogArchive(Path oldFile)
void postLogArchive(Path oldFile)

Returning true from the pre versions would bypass the default actions (although in this case
I am not sure how useful that would be).

That way it would possible to act upon HLog file activity and (for example) archive these
files somewhere.

Of course one could also just watch the directories in HDFS periodically, but that seems awkward.


-- Lars

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message