hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version
Date Sun, 14 Aug 2011 05:12:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084786#comment-13084786
] 

Lars Hofhansl commented on HBASE-4071:
--------------------------------------

Having had a very brief look at the code it seems that wouldn't even be that hard.
If we had (say) another CF attributes called "minversions", we could check for that in ScanQueryMatcher.match(...)
as such:
If minversions > 0 we'd still do the CF expiry check (isExpired), but instead of bailing
right there, we'd remember the resulting MatchCode, and continue all the way to checkColumn(...)
of the ColumnTracker. checkColumn would receive the MatchCode from the expiry check as additional
argument. Deletes and filters should still behave correctly as far as I can see.

In ColumnTracker.checkColumn we now have three cases:
1. # version < minversions: behave as if the version check was positive
2. minversions < # versions < maxversion: return whatever the CF expiry check would
have returned. (the details are probably a bit more tricky)
3. # versions > maxversions: same behavior as before.


> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>
> We were chatting today about our backup cluster.  What we want is to be able to restore
the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter,
if the versions are older than one week, rather than as we do with TTL where we let go of
all versions older than TTL, instead, let go of all versions EXCEPT the last one written.
 So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught
within a week of its happening -- user mistakenly removes a critical table -- then we'll be
able to restore up the the moment just before catastrophe hit otherwise, we keep one version
only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message