hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jerry Lam (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-6757) Very inefficient behaviour of scan using FilterList
Date Tue, 11 Sep 2012 14:51:08 GMT

     [ https://issues.apache.org/jira/browse/HBASE-6757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Jerry Lam updated HBASE-6757:

    Attachment: DisplayFilter.java

The TestColumnPrefixFilter demonstrates the inefficiency of the scan by using DisplayFilter
which logs all the calls to the filter's methods. 

testEfficiencyWithoutFliterList only scans 3 keyvalues and return. Whereas, testEfficiencyWithFliterList
scans 10002 keyvalues. The only difference between the two tests is that testEfficiencyWithFliterList
uses FilterList to wrap the ColumnPrefixFilter and the filterlist is passed to the scan instead
of the ColumnPrefixFilter.

For this to work, DisplayFilter needs to be deployed to hbase first. The log is written to
the HMaster log.
> Very inefficient behaviour of scan using FilterList
> ---------------------------------------------------
>                 Key: HBASE-6757
>                 URL: https://issues.apache.org/jira/browse/HBASE-6757
>             Project: HBase
>          Issue Type: Improvement
>          Components: filters
>    Affects Versions: 0.90.6
>            Reporter: Jerry Lam
>         Attachments: CopyOfTestColumnPrefixFilter.java, DisplayFilter.java
> The behaviour of scan is very inefficient when using with FilterList.
> The FilterList rewrites the return code from NEXT_ROW to SKIP from a filter if Operator.MUST_PASS_ALL
is used. 
> This happens when using ColumnPrefixFilter. Even though the ColumnPrefixFilter indicates
to jump to NEXT_ROW because no further match can be found, the scan continues to scan all
versions of a column in that row and all columns of that row because the ReturnCode from ColumnPrefixFilter
has been rewritten by the FilterList from NEXT_ROW to SKIP. 
> This is particularly inefficient when there are many versions in a column because the
check is performed on all versions of the column instead of just by checking the qualifier
of the column name.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message