Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hbase.apache.org
Date: Tue, 22 Sep 2015 16:51:05 +0000 (UTC)
From: "Andrew Purtell (JIRA)" <jira@apache.org>
To: dev@hbase.apache.org
Message-ID: <JIRA.12863043.1441890106000.40497.1442940665181@Atlassian.JIRA>
In-Reply-To: <JIRA.12863043.1441890106000@Atlassian.JIRA>
References: <JIRA.12863043.1441890106000@Atlassian.JIRA>
 <JIRA.12863043.1441890106205@arcas>
Subject: [jira] [Reopened] (HBASE-14397) PrefixFilter doesn't filter all
 remaining rows if the prefix is longer than rowkey being compared
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HBASE-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Andrew Purtell reopened HBASE-14397:
------------------------------------

Sorry I'm late. I have some concerns with this patch. 

Why skip only some rows where the prefix is longer than the key being compared? Why not all? Why make it a fixed number? Why is there no logging or indication that the max skip limit was reached?  This hasn't fixed the problem, its added a different type of surprising behavior. 

I think we should amend or revert and recommit the current patch with an update that removes the "skippedCompareRows" stuff. Just skip the rows if the keys aren't long enough to match the prefix, and document this _consistent_ behavior. 

> PrefixFilter doesn't filter all remaining rows if the prefix is longer than rowkey being compared
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-14397
>                 URL: https://issues.apache.org/jira/browse/HBASE-14397
>             Project: HBase
>          Issue Type: Improvement
>          Components: Filters
>    Affects Versions: 2.0.0
>            Reporter: Jianwei Cui
>            Assignee: Jianwei Cui
>            Priority: Minor
>             Fix For: 2.0.0, 1.3.0
>
>         Attachments: HBASE-14397-trunk-v1.patch
>
>
> The PrefixFilter will filter rowkey as:
> {code}
>   public boolean filterRowKey(Cell firstRowCell) {
>     ...
>     int length = firstRowCell.getRowLength();
>     if (length < prefix.length) return true; // ===> return directly if the prefix is longer
>     ....
>     if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
>       passedPrefix = true;
>     }
>     filterRow = (cmp != 0);
>     return filterRow;
>   }
> {code}
> If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey will filter the rowkey directly without comparing, so that won't set 'passedPrefix' flag even the current row is larger than the prefix.
> For example, if there are three rows 'a', 'b' and 'c' in the table, and we issue a scan request as:
> {code}
> hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => "(PrefixFilter ('aa'))"}
> {code}
> The region server will check the three rows before returning.  In our production, the user issue a scan with a PrefixFilter. The prefix is longer than the rowkeys of following millions of rows, so the region server will continue to check rows until hit a rowkey longer than the prefix. This make the client easily timeout. To fix this case, it seems we need to compare the prefix with the rowkey every serveral rows even when the prefix is longer.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)