hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heng Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14782) FuzzyRowFilter skips valid rows
Date Thu, 12 Nov 2015 08:43:11 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001848#comment-15001848
] 

Heng Chen commented on HBASE-14782:
-----------------------------------

Thanks [~vrodionov] for your test code.

The reason is that, as you can see in the patch.
{code}
-    // NOT FOUND -> seek next using hint
+    // NOT FOUND -> it means this row has been passed, so we jump to next row
     lastFoundIndex = -1;
-    return ReturnCode.SEEK_NEXT_USING_HINT;
+    return ReturnCode.NEXT_ROW;
{code}

FuzzyRowFilter should jump to next row if current row not match.  
Currently, if not match, fuzzyRowFilter will always return SEEK_NEXT_USING_HINT

I am not sure what is the difference between StoreScanner.seekAsDirection and StoreScanner.seekToNextRow,
 but currently
If we go path StoreScanner.seekAsDirection (FuzzyRowFilter return SEEK_NEXT_USING_HINT), 
StoreScanner.heap.peek() will return null.  
So heap will be set to null in StoreScanner.close  

Relates code in StoreScanner.next as below:
{code}
LOOP: do {
         ......
        ScanQueryMatcher.MatchCode qcode = matcher.match(cell);
        qcode = optimize(qcode, cell);
        switch(qcode) {
         .......
        case SEEK_NEXT_ROW:
          // This is just a relatively simple end of scan fix, to short-cut end
          // us if there is an endKey in the scan.
          if (!matcher.moreRowsMayExistAfter(cell)) {
            return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
          }
          seekToNextRow(cell);
          break;
       ........
        case SEEK_NEXT_USING_HINT:
          Cell nextKV = matcher.getNextKeyHint(cell);
          if (nextKV != null) {
            seekAsDirection(nextKV);
          } else {
            heap.next();
          }
          break;
        default:
          throw new RuntimeException("UNEXPECTED");
        }
      } while((cell = this.heap.peek()) != null);

      if (count > 0) {
        return scannerContext.setScannerState(NextState.MORE_VALUES).hasMoreValues();
      }
      close(false); // heap will set to null which cause the other rows will not be processed.
      return scannerContext.setScannerState(NextState.NO_MORE_VALUES).hasMoreValues();
{code}








> FuzzyRowFilter skips valid rows
> -------------------------------
>
>                 Key: HBASE-14782
>                 URL: https://issues.apache.org/jira/browse/HBASE-14782
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>         Attachments: HBASE-14782.patch
>
>
> The issue may affect not only master branch, but previous releases as well.
> This is from one of our customers:
> {quote}
> We are experiencing a problem with the FuzzyRowFilter for HBase scan. We think that it
is a bug. 
> Fuzzy filter should pick a row if it matches filter criteria irrespective of other rows
present in table but filter is dropping a row depending on some other row present in table.

> Details/Step to reproduce/Sample outputs below: 
> Missing row key: \x9C\x00\x044\x00\x00\x00\x00 
> Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX 
> Prerequisites 
> 1. Create a test table. HBase shell command -- create 'fuzzytest','d' 
> 2. Insert some test data. HBase shell commands: 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' 
> • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in output because
it matches filter criteria. (Refer how to run code below) 
> Insert the row key causing bug: 
> HBase shell command: put 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk'

> Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in output
even though it still matches filter criteria. 
> {quote}
> Verified the issue on master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message