hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aleksandr Maksymenko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13705) MultiRowRangeFilter seems to be working incorrect if RowRange.startRowInclusive = false
Date Mon, 18 May 2015 12:39:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547943#comment-14547943

Aleksandr Maksymenko commented on HBASE-13705:

After one more quick look at code, I see another issue. This time it's connected to stopRow.
Lets say we've RowRange with startRowInclusive = true and stopRowInclusive = false. It's a
pretty common use case. 
The issue may appear if we have only one record with row that is exactly the same as stopRow.
In method filterRowKey we look for a row range by calling getNextRangeIndex, the row range
described above should be found. Despite RowRange is found, actual row should not be included.
But it will, as I can see in code.

It seems to easies solution is to fix getNextRangeIndex method by replacing:
    // the row key equals one of the start keys, and the the range exclude the start key
    if(rangeList.get(index).startRowInclusive == false) {
      EXCLUSIVE = true;
by something like this (attention, I didn't ceck if it's even compiled):
    if(!rangeList.get(index).contains(rowKey)) {
      EXCLUSIVE = true;
It seems that it should resolve both issues, but let's someone else to check it (and test
if possible).

> MultiRowRangeFilter seems to be working incorrect if RowRange.startRowInclusive = false
> ---------------------------------------------------------------------------------------
>                 Key: HBASE-13705
>                 URL: https://issues.apache.org/jira/browse/HBASE-13705
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.1.0
>            Reporter: Aleksandr Maksymenko
> I've found the issue during code review, so I don't have tests and I even didn't test
this case manualy. So I'll try to describe it in words.
> Pre-condition: we're using scan with MultiRowRangeFilter with some RowRange's with startRowInclusive
= false. This means that we want to include all rows that are strictly greater than startRow
(and less then stopRow, but it doesn't matter for now). 
> What happens in MultiRowRangeFilter.filterRowKey (worth case is described):
> 1. Line 91: Check if current range contains a row. Lets follow the case when it doesn't.
> 2. Line 94: Search for the next RowRange in method getNextRangeIndex.
> 3. Line 238: We've found a RowRange, check if startRowInclusive == false and set EXCLUSIVE
= true. This variable indicates if next row should be excluded.
> 4. Line 105: Check if EXCLUSIVE == true, if so skip this row.
> The problem: we've skipped first row we got in this range, but we never checked if this
row is a RowRange.startRow . In distributed system may not get RowRange.startRow on current
instance, so we may exclude some another row. Moreover, we may not have RowRange.startRow
at all in the DB, we will exclude some rows that are (possible) close to RowRange.startRow,
but not equals to it.

This message was sent by Atlassian JIRA

View raw message