hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "cuijianwei (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HBASE-14397) PrefixFilter fail to filter all remainings if the prefix is longer than compared rowkey
Date Thu, 10 Sep 2015 13:59:46 GMT

     [ https://issues.apache.org/jira/browse/HBASE-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

cuijianwei updated HBASE-14397:
-------------------------------
    Description: 
The PrefixFilter will filter rowkey as:
{code}
  public boolean filterRowKey(Cell firstRowCell) {
    ...
    int length = firstRowCell.getRowLength();
    if (length < prefix.length) return true; // ===> return directly if the prefix is
longer
    ....
    if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
      passedPrefix = true;
    }
    filterRow = (cmp != 0);
    return filterRow;
  }
{code}
If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey will filter the
rowkey directly without comparing, so that won't set 'passedPrefix' flag even the current
row is larger than the prefix.
For example, if there are three rows 'a', 'b' and 'c' in the table, and we issue a scan request
as:
{code}
hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => "(PrefixFilter
('aa'))"}
{code}
The region server will check the three rows before returning.  In our production, the user
issue a scan with a PrefixFilter. The prefix is longer than the rowkeys of following millions
of rows, so the region server will continue to check rows until hit a rowkey longer than the
prefix. This make the client easily timeout. To fix this case, it seems we need to compare
the prefix with the rowkey every serveral rows even when the prefix is longer.

  was:
The PrefixFilter will filter rowkey as:
{code}
  public boolean filterRowKey(Cell firstRowCell) {
    ...
    int length = firstRowCell.getRowLength();
    if (length < prefix.length) return true; // ===> return directly if the prefix is
longer
    ....
    if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0)) {
      passedPrefix = true;
    }
    filterRow = (cmp != 0);
    return filterRow;
  }
{code}
If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey will filter the
rowkey directly without comparing, so that won't set 'passedPrefix' flag even the current
row is larger than the prefix.
For example, if there are three rows 'a', 'b' and 'c' in the table, and we issue a scan request
as:
{code}
hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => "(PrefixFilter
('aa'))"}
{code}
The region server will check the three rows before returning.  In our production, the user
issue a scan with a PrefixFilter. The prefix is longer than the rowkeys of following millions
of rows, so the region server will continue to check rows until hit a rowkey longer than the
prefix. This make the client easily timeout. To fix this case, it seems we need to compare
the prefix with the rowkey even when the prefix is longer.


> PrefixFilter fail to filter all remainings if the prefix is longer than compared rowkey
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-14397
>                 URL: https://issues.apache.org/jira/browse/HBASE-14397
>             Project: HBase
>          Issue Type: Improvement
>          Components: Filters
>    Affects Versions: 2.0.0
>            Reporter: cuijianwei
>            Priority: Minor
>
> The PrefixFilter will filter rowkey as:
> {code}
>   public boolean filterRowKey(Cell firstRowCell) {
>     ...
>     int length = firstRowCell.getRowLength();
>     if (length < prefix.length) return true; // ===> return directly if the prefix
is longer
>     ....
>     if ((!isReversed() && cmp > 0) || (isReversed() && cmp < 0))
{
>       passedPrefix = true;
>     }
>     filterRow = (cmp != 0);
>     return filterRow;
>   }
> {code}
> If the prefix is longer than the current rowkey, PrefixFilter#filterRowKey will filter
the rowkey directly without comparing, so that won't set 'passedPrefix' flag even the current
row is larger than the prefix.
> For example, if there are three rows 'a', 'b' and 'c' in the table, and we issue a scan
request as:
> {code}
> hbase(main):001:0> scan 'test_table', {STARTROW => 'a', FILTER => "(PrefixFilter
('aa'))"}
> {code}
> The region server will check the three rows before returning.  In our production, the
user issue a scan with a PrefixFilter. The prefix is longer than the rowkeys of following
millions of rows, so the region server will continue to check rows until hit a rowkey longer
than the prefix. This make the client easily timeout. To fix this case, it seems we need to
compare the prefix with the rowkey every serveral rows even when the prefix is longer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message