hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4071) Data GC: Remove all versions > TTL EXCEPT the last written version
Date Wed, 24 Aug 2011 01:55:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4071?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13089940#comment-13089940
] 

jiraposter@reviews.apache.org commented on HBASE-4071:
------------------------------------------------------



bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java,
line 127
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line127>
bq.  >
bq.  >     I defer to Jon's review here (he knows this stuff...)

This basically just counts versions up instead of down (see next comment below as to why).


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java,
line 205
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34607#file34607line205>
bq.  >
bq.  >     Why this change?

ExplicitColumnTracker used to count versions down and ScanWildcardColumnTracker is counting
version up.
This change goes to together with the previous change (increment vs decrement).

I did that for two reasons:

1. When both column trackers follow similar logic it will be easier to change them and to
simplify them further in the future (such as extracting a GC policy).

2. Now the comparison for versions can be:
 if(count >= maxVersions || (count >= minVersions && isExpired(timestamp))
    // Done with versions for this column

whereas otherwise it would need to be:
 if(count == 0) || ((maxVersion-count) < minVersions && isExpired(timestamp))
    // Done with versions for this column

Which is harder to read and less intuitive, I think.


bq.  On 2011-08-24 00:03:43, Michael Stack wrote:
bq.  > http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java,
line 47
bq.  > <https://reviews.apache.org/r/1582/diff/7/?file=34606#file34606line47>
bq.  >
bq.  >     javadoc doesn't match param name -- I can fix on commit.

Argh... One more that I missed. Thanks Stack!


- Lars


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1582/#review1602
-----------------------------------------------------------


On 2011-08-23 05:13:38, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/1582/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-08-23 05:13:38)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Allow enforcing a minimum number of versions when TTL is enable for a store.
bq.  The GC logic for both versions and TTL is unified inside the ColumnTrackers.
bq.  
bq.  
bq.  This addresses bug HBASE-4071.
bq.      https://issues.apache.org/jira/browse/HBASE-4071
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
1160440 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
1160440 
bq.  
bq.  Diff: https://reviews.apache.org/r/1582/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  Ran all tests. I get error (not failures) from two: TestDistributedLogSplitting and TestHTablePool.
Both fail with or without my changes.
bq.  New tests: TestMinVersions.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.



> Data GC: Remove all versions > TTL EXCEPT the last written version
> ------------------------------------------------------------------
>
>                 Key: HBASE-4071
>                 URL: https://issues.apache.org/jira/browse/HBASE-4071
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: stack
>            Assignee: Lars Hofhansl
>         Attachments: MinVersions.diff
>
>
> We were chatting today about our backup cluster.  What we want is to be able to restore
the dataset from any point of time but only within a limited timeframe -- say one week.  Thereafter,
if the versions are older than one week, rather than as we do with TTL where we let go of
all versions older than TTL, instead, let go of all versions EXCEPT the last one written.
 So, its like versions==1 when TTL > one week.  We want to allow that if an error is caught
within a week of its happening -- user mistakenly removes a critical table -- then we'll be
able to restore up the the moment just before catastrophe hit otherwise, we keep one version
only.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message