hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prakash Khemani (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5010) Filter HFiles based on TTL
Date Fri, 27 Jan 2012 04:53:43 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13194454#comment-13194454
] 

Prakash Khemani commented on HBASE-5010:
----------------------------------------

This change is doesn't break HBASE-4721.

HBASE-4721 introduced another parameter called hbase.hstore.time.to.purge.deletes to keep
deletes even after major compactions. But hbase.hstore.time.to.purge.deletes doesn't override
the TTL of the store.

Pasting the comment from code which hopefully makes it clear that this diff works with HBASE-4721

  // By default, when hbase.hstore.time.to.purge.deletes is 0ms, a delete
  // marker is always removed during a major compaction. If set to non-zero
  // value then major compaction will try to keep a delete marker around for
  // the given number of milliseconds. We want to keep the delete markers
  // around a bit longer because old puts might appear out-of-order. For
  // example, during log replication between two clusters.
  //
  // If the delete marker has lived longer than its column-family's TTL then
  // the delete marker will be removed even if time.to.purge.deletes has not
  // passed. This is because all the Puts that this delete marker can influence
  // would have also expired. (Removing of delete markers on col family TTL will
  // not happen if min-versions is set to non-zero)
  //
                
> Filter HFiles based on TTL
> --------------------------
>
>                 Key: HBASE-5010
>                 URL: https://issues.apache.org/jira/browse/HBASE-5010
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>             Fix For: 0.94.0
>
>         Attachments: 5010.patch, D1017.1.patch, D1017.2.patch, D909.1.patch, D909.2.patch,
D909.3.patch, D909.4.patch, D909.5.patch, D909.6.patch
>
>
> In ScanWildcardColumnTracker we have
> {code:java}
>  
>   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
>   ...
>   private boolean isExpired(long timestamp) {
>     return timestamp < oldestStamp;
>   }
> {code}
> but this time range filtering does not participate in HFile selection. In one real case
this caused next() calls to time out because all KVs in a table got expired, but next() had
to iterate over the whole table to find that out. We should be able to filter out those HFiles
right away. I think a reasonable approach is to add a "default timerange filter" to every
scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message