hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Phabricator (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5010) Filter HFiles based on TTL
Date Thu, 22 Dec 2011 02:27:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13174574#comment-13174574
] 

Phabricator commented on HBASE-5010:
------------------------------------

Kannan has commented on the revision "[jira] [HBASE-5010] [89-fb] Filter HFiles based on TTL".

  Mikhail: Nice work, and unit test. Compaction related comment inline.

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:133 pre-existing issue:
this constructor is no longer just for major compactions. Minor compactions also use this.
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java:154 If we do a similar
check in this constructor, we would get the same optimization for compactions also.

  As we talked about offline, if the entire file has expired data, then we can avoid adding
the scanner to  the KeyValueHeap below. So for CFs which have routinely expiring data due
to TTL, compactions would have to read a lot less data too or could essentially turn into
feather-weight ops which just delete unnecessary/old files.

REVISION DETAIL
  https://reviews.facebook.net/D909

                
> Filter HFiles based on TTL
> --------------------------
>
>                 Key: HBASE-5010
>                 URL: https://issues.apache.org/jira/browse/HBASE-5010
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Mikhail Bautin
>            Assignee: Mikhail Bautin
>         Attachments: D909.1.patch
>
>
> In ScanWildcardColumnTracker we have
> {code:java}
>  
>   this.oldestStamp = EnvironmentEdgeManager.currentTimeMillis() - ttl;
>   ...
>   private boolean isExpired(long timestamp) {
>     return timestamp < oldestStamp;
>   }
> {code}
> but this time range filtering does not participate in HFile selection. In one real case
this caused next() calls to time out because all KVs in a table got expired, but next() had
to iterate over the whole table to find that out. We should be able to filter out those HFiles
right away. I think a reasonable approach is to add a "default timerange filter" to every
scan for a CF with a finite TTL and utilize existing filtering in StoreFile.Reader.passesTimerangeFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message