hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4536) Allow CF to retain deleted rows
Date Sat, 15 Oct 2011 04:00:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13128054#comment-13128054
] 

jiraposter@reviews.apache.org commented on HBASE-4536:
------------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2178/#review2609
-----------------------------------------------------------


Great stuff.


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/2178/#comment5854>

    Javadoc what the new param means



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/2178/#comment5855>

    So you made this change to make it so user doesn't have condition where they are scratching
their head trying to figure why something is not being let go though TTL is four hours?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
<https://reviews.apache.org/r/2178/#comment5856>

    Is this value read often?  If so you might want to cache it.  Profiling these things can
sometimes show up as costing.
    
    Not important but if you get to make new patch do
    
    if (test) block
    
    or
    
    if (test) {
      block
    }
    
    rather than the 
    
    if (test)
      block
    
    you have.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
<https://reviews.apache.org/r/2178/#comment5857>

    Let this be the convention from here on out.  Maybe even two leading underscores and no
trailing underscore... but whatever, let this be the convention for system configs in attribute
maps going forward.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
<https://reviews.apache.org/r/2178/#comment5858>

    Good.
    
    There is a real old feature request in hbase that asked for this.. let me find the issue....
I can't find it.... user wanted to see deletes too in a scan.
    
    Then with Benoit a week or so back we had to look in hfile because we didn't trust a delete....
we thought there stuff behind it.  We could have used this facility to show all that was in
hbase (if we'd enabled this new feature of course).
    
    I think this facility should be on by default.  Whats the downsides?  We don't let go
of stuff if < maxversions?
    
    How you going to do versions now?  If I ask for a scan beyond a delete will I see maxversions
only  because thats all we keep between deletes?  Will it be the case that I will end up having:
    
    maxversions... delete marker .... maxversions ... delete marker .... maxversions...
    
    and so on if this facility is enabled?
    
    I won't have:
    
    as many versions as was written but we only return maxversions......delete marker......
as many versions as were written...... delete marker...... 
    
    Is that right?  (If you follow me).
    
    Say 'garbage collected' in the comment rather than just 'collected' 



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
<https://reviews.apache.org/r/2178/#comment5859>

    Bad name for a param.  This is name you'd give the getter for a param named 'delete'.
    
    What is this param doing?  Is this the raw attribute or is it from HCD saying keep deletes?
 Make it clear in javadoc.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/2178/#comment5860>

    good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
<https://reviews.apache.org/r/2178/#comment5861>

    good



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5862>

    This stuff needs to make it out into the hbase book.  Good stuff.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5863>

    So when I am doing a check for max versions = 3 and I have for a single cell:
    
    
    put ... put....delete....put
    
    
    If I keep delete markers and do a raw scan, I'll get: p p d
    
    If I have keep delete markers but not a raw scan I get: p p
    
    If I do not have keep delete markers and its not an ordinary scan: p p
    
    Something like that?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5864>

    I'm not sure I get what this one is about.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5866>

    An 'of' should be an 'if' here?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5867>

    These params on an SQM are starting to build up.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5868>

    I don't understand whats up here.  Why we changing timerange if seePastDeleteMarker is
set?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5869>

    Is 'keepDeletedRows' a bad name for this variable?  Should it be keepDeletedCells?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5870>

    So, this is a delete kv at this stage of the processing?  So, if its older than any put
in any storefile on major compactions, we can let it go since has no effect?  Thats good.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
<https://reviews.apache.org/r/2178/#comment5871>

    This makes me nervous.  All old tests pass on this code?  You have a unit test to test
that we are doing the right thing in here both for the old and the new handling?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<https://reviews.apache.org/r/2178/#comment5872>

    Again, bad name for param.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
<https://reviews.apache.org/r/2178/#comment5873>

    Is this right?  We should count it as a version if keepdeletes is on?  Unless this flag
is saying 'keepdeletes' and not 'this is a delete kv' as I'm interpreting it).



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
<https://reviews.apache.org/r/2178/#comment5874>

    No biggie... previous formatting was better.



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
<https://reviews.apache.org/r/2178/#comment5875>

    This belongs in HCD rather than up here in this global HConstants?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
<https://reviews.apache.org/r/2178/#comment5876>

    Why we move this to the end?



http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb
<https://reviews.apache.org/r/2178/#comment5877>

    Very nice.



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
<https://reviews.apache.org/r/2178/#comment5878>

    Good test



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
<https://reviews.apache.org/r/2178/#comment5879>

    White space


- Michael


On 2011-10-12 04:55:53, Lars Hofhansl wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2178/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-10-12 04:55:53)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  HBase timerange Gets and Scans allow to do "timetravel" in HBase. I.e. look at the state
of the data at any point in the past, provided the data is still around.
bq.  This did not work for deletes, however. Deletes would always mask all puts in the past.
bq.  This change adds a flag that can be on HColumnDescriptor to enable retention of deleted
rows.
bq.  These rows are still subject to TTL and/or VERSIONS.
bq.  
bq.  This changes the following:
bq.  1. There is a new flag on HColumnDescriptor enabling that behavior.
bq.  2. Allow gets/scans with a timerange to retrieve rows hidden by a delete marker, if the
timerange does not include the delete marker.
bq.  3. Do not unconditionally collect all deleted rows during a compaction.
bq.  4. Allow a "raw" Scan, which retrieves all delete markers and deleted rows.
bq.  
bq.  The change is small'ish, but the logic is intricate, so please review carefully.
bq.  
bq.  
bq.  This addresses bug HBASE-4536.
bq.      https://issues.apache.org/jira/browse/HBASE-4536
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ColumnTracker.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ExplicitColumnTracker.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ScanWildcardColumnTracker.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/main/ruby/hbase/admin.rb 1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestExplicitColumnTracker.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestKeepDeletes.java
PRE-CREATION 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestMinVersions.java
1181616 
bq.    http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWildcardColumnTracker.java
1181616 
bq.  
bq.  Diff: https://reviews.apache.org/r/2178/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  All tests pass now.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Lars
bq.  
bq.


                
> Allow CF to retain deleted rows
> -------------------------------
>
>                 Key: HBASE-4536
>                 URL: https://issues.apache.org/jira/browse/HBASE-4536
>             Project: HBase
>          Issue Type: New Feature
>          Components: regionserver
>    Affects Versions: 0.92.0
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.0
>
>
> Parent allows for a cluster to retain rows for a TTL or keep a minimum number of versions.
> However, if a client deletes a row all version older than the delete tomb stone will
be remove at the next major compaction (and even at memstore flush - see HBASE-4241).
> There should be a way to retain those version to guard against software error.
> I see two options here:
> 1. Add a new flag HColumnDescriptor. Something like "RETAIN_DELETED".
> 2. Folds this into the parent change. I.e. keep minimum-number-of-versions of versions
even past the delete marker.
> #1 would allow for more flexibility. #2 comes somewhat naturally with parent (from a
user viewpoint)
> Comments? Any other options?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message