hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Hofhansl (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version
Date Tue, 16 Aug 2011 17:15:27 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085853#comment-13085853

Lars Hofhansl commented on HBASE-4071:

Thanks Stack.

re if (currentCount <= minVersions): in this case the count was already incremented by
the maxversions check (which is just outside of the context provided by the diff).

re expunging expired rows on flush: Currently it seems this is done as an optimization.
The reasons why believe this cannot be done if minversions>0 are: (1) The cache might not
have all version, so I cannot count the versions to determine the cut-off point. (2) Even
if we have minversions=1, there is no guarantee that the versions in the cache include the
latest one (puts could have been backdated).
In both cases I think only at compaction time do we have enough information to remove expired
cells (if minversions is >0).

re comments: Of course. That's why there is version control :)
I just left these in as comments to have a place where I can put a comment as to why I believe
these are not necessary.

So you think test coverage of the existing functionality is sufficient? That is very good
to know.
I'll add tests for the new functionality.

What's the general feeling? Should I aim for minimal intrusion or attempt to do a bit refactoring
to abstract these policies into an interface? Leaning towards the latter, but on the other
hand the change would be more risky.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
> We were chatting today about our backup cluster.  What we want is to be able to restore
the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter,
if the versions are older than one week, rather than as we do with TTL where we let go of
all versions older than TTL, instead, let go of all versions EXCEPT the last one written.
 So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught
within a week of its happening -- user mistakenly removes a critical table -- then we'll be
able to restore up the the moment just before catastrophe hit otherwise, we keep one version

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message