hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version
Date Tue, 16 Aug 2011 05:38:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13085541#comment-13085541
] 

stack commented on HBASE-4071:
------------------------------

@LarsH

Patch looks great.  Minor intrusion.  I'd say that best way to build confidence that all works
as it did is to run all unit tests -- IIRC we should have coverage enough to notice if you've
broken the default behavior -- and then for your additions, work up a few unit tests that
exercise the new functionality (This will also uncover any other code your patch should have
touched -- I don't think you've missed anything since same Scan code is used compacting....
this you might want to verify, that compacting we do right thing, especially major vs minor
around this new minimum versions config).

Is this right?

+      if (currentCount <= minVersions)

The other two instances preincrement currentCount.  This instance doesn't.  Is there a reason
(If so, probably deserves a comment).  Looks too like a bit of repeated code here that perhaps
could be factored out to a method of its own?

This is sketch code but FYI, around these parts folks remove code rather than comment it out...

On 1. above, 'Expired rows can no longer be expunged...', why not?  Are we doing this now?
 (I don't remember) If so, why can't we do this still?

On 2., it don't look too bad to me

On 7., yes.

Good on you LarsH.

> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore
the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter,
if the versions are older than one week, rather than as we do with TTL where we let go of
all versions older than TTL, instead, let go of all versions EXCEPT the last one written.
 So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught
within a week of its happening -- user mistakenly removes a critical table -- then we'll be
able to restore up the the moment just before catastrophe hit otherwise, we keep one version
only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message