hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16225) Refactor ScanQueryMatcher
Date Tue, 26 Jul 2016 07:55:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15393389#comment-15393389

Duo Zhang commented on HBASE-16225:

You mean delete markers are now passed to filters (The per cell filter)?
In the old SQM implementation, if KeepDeletedCells is TRUE, then we will not track delete
markers. Instead, we will use column tracker to check that if there are already enough versions
thus we can drop a delete marker. On this code path, we will pass the delete marker to a filter.
Of course, KeepDeletedCells can only be TRUE when compaction or a raw scan, and if no coprocessor
hook then we will not have a filter when compaction, and this is also why I said above that
we should disable filter when raw scan.

So this is for cases where there are special CPs written to deal with delete markers in compaction?
I think most CPs do not need to deal with delete markers. Usually they only want to drop some
cells during compaction? I mean that, if a CP really want to deal with delete marker, they
would better implement a new scanner instead of using a filter since the delete logic is really
complicated in HBase. And for the normal CPs want to drop some cells during compactions, I
suggest we add a new type of filter which is only used for compaction? This is safer and clearer.

What do you reckon? If we change the behaviour some existing use cases built with filters
may break.
But I think the use cases are not reliable?

For example, max versions = 2, 3 cells with timestamp T1 < T2 < T3

For a normal scan, T3 and T2 are returned.
If your filter eat T3, then T2 and T1 are returned.

But this is not reliable. After a compaction(no need to be major compaction), T1 will be gone
forever and you can not get it with your filter...


> Refactor ScanQueryMatcher
> -------------------------
>                 Key: HBASE-16225
>                 URL: https://issues.apache.org/jira/browse/HBASE-16225
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Duo Zhang
>            Assignee: Duo Zhang
>         Attachments: HBASE-16225-v1.patch, HBASE-16225-v2.patch, HBASE-16225.patch
> As said in HBASE-16223, the code of {{ScanQueryMatcher}} is too complicated. I suggest
that we can abstract an interface and implement several sub classes which separate different
logic into different implementations. For example, the requirements of compaction and user
scan are different, now we also need to consider the logic of user scan even if we only want
to add a logic for compaction. And at least, the raw scan does not need a query matcher...
we can implement a dummy query matcher for it.
> Suggestions are welcomed. Thanks.

This message was sent by Atlassian JIRA

View raw message