hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jesse Yates (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8809) Include deletes in the scan (setRaw) method does not respect the time range or the filter
Date Fri, 05 Jul 2013 20:39:49 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13701107#comment-13701107
] 

Jesse Yates commented on HBASE-8809:
------------------------------------

As slight follow up to this, it feels like raw scans should also ignore the column version/timestamp
filtering. In particular, I'm talking about this section in ScanQueryMatcher:
{code}
 MatchCode colChecker = columns.checkColumn(bytes, offset, qualLength,
        timestamp, type, kv.getMemstoreTS() > maxReadPointToTrackVersions);
    /*
     * According to current implementation, colChecker can only be
     * SEEK_NEXT_COL, SEEK_NEXT_ROW, SKIP or INCLUDE. Therefore, always return
     * the MatchCode. If it is SEEK_NEXT_ROW, also set stickyNextRow.
     */
    ...
{code}

Where the ScanWildcardColumnTracker will not ignore the timestamp in the simple case - four
puts to the same row with different timestamps will ignore the oldest by default, even though
its still "present" in the store regardless of the rawness of the scan.

Thoughts?
                
> Include deletes in the scan (setRaw) method does not respect the time range or the filter
> -----------------------------------------------------------------------------------------
>
>                 Key: HBASE-8809
>                 URL: https://issues.apache.org/jira/browse/HBASE-8809
>             Project: HBase
>          Issue Type: Bug
>          Components: Scanners
>            Reporter: Vasu Mariyala
>            Assignee: Lars Hofhansl
>             Fix For: 0.98.0, 0.95.2, 0.94.10
>
>         Attachments: 8809-0.94.txt, 8809-trunk.txt, DeleteMarkers.doc
>
>
> If a row has been deleted at time stamp 'T' and a scan with time range (0, T-1) is executed,
it still returns the delete marker at time stamp 'T'. It is because of the code in ScanQueryMatcher.java
> {code}
>       if (retainDeletesInOutput
>           || (!isUserScan && (EnvironmentEdgeManager.currentTimeMillis() - timestamp)
<= timeToPurgeDeletes)
>           || kv.getMemstoreTS() > maxReadPointToTrackVersions) {
>         // always include or it is not time yet to check whether it is OK
>         // to purge deltes or not
>         return MatchCode.INCLUDE;
>       }
> {code}
> The assumption is scan (even with setRaw is set to true) should respect the filters and
the time range specified.
> Please let me know if you think this behavior can be changed so that I can provide a
patch for it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message