hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heng Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-14782) FuzzyRowFilter skips valid rows
Date Fri, 13 Nov 2015 11:25:10 GMT

    [ https://issues.apache.org/jira/browse/HBASE-14782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15003873#comment-15003873

Heng Chen commented on HBASE-14782:

I found something more.
All StoreScanner.seekAsDirection and StoreScanner.seekToNextRow called StoreScanner.reseek(Cell
kv) inside.  
The difference is the param Cell passed in.

In StoreScanner.seekToNextRow,   the param Cell passed in reseek is generated by CellUtil.createLastOnRow
But in StoreScanner.seekAsDirection,  it is generated by matcher.getNextKeyHint which called
FuzzyRowFilter.getNextCellHint inside.

CellUtil.createLastOnRow(Cell kv) will create one cell in the same row as kv,  but with Long.MIN_VALUE
as timestamp.
FuzzyRowFilter.getNextCellHint(Cell kv) will create one cell in the next row with Long.MAX_VALUE
as timestamp.

There will be logic as below (in {{KeyValueHeap.generalizedSeek}})

    if (current == null) {
      return false;
    current = null;

    KeyValueScanner scanner;
    while ((scanner = heap.poll()) != null) {
      Cell topKey = scanner.peek();
      if (comparator.getComparator().compare(seekKey, topKey) <= 0) {
        current = pollRealKV();
        return current != null;

      boolean seekResult;
      if (isLazy && heap.size() > 0) {
        seekResult = scanner.requestSeek(seekKey, forward, useBloom);
      } else {
        seekResult = NonLazyKeyValueScanner.doRealSeek(
            scanner, seekKey, forward);

      if (!seekResult) {
      } else {

    // Heap is returning empty, scanner is done
    return false;

For example,  if we just put "\\x9C\\x00\\x044\\x00\\x00\\x00\\x00" 
and "\\x9C\\x00\\x03\\xE9e\\xBB{X\\x1Fwts\\x1F\\x15vRX" into table.

As original logic,  we will go path StoreScanner.seekAsDirection, 
so seekKey in KeyValueHeap.generalizedSeek will be '\\x9C\\x00\\x044\\x00\\x00\\x00\\x00'
with Long.MAX_VALUE as timestamp

The first round in while,  topKey is "\\x9C\\x00\\x03\\xE9e\\xBB{X\\x1Fwts\\x1F\\x15vRX",
So "if (comparator.getComparator().compare(seekKey, topKey) <= 0)"  will be false and 
we can't find seekKey in NonLazyKeyValueScanner.doRealSeek

At last  KeyValueHeap.heap will be empty and KeyValueHeap.current will be null.   


> FuzzyRowFilter skips valid rows
> -------------------------------
>                 Key: HBASE-14782
>                 URL: https://issues.apache.org/jira/browse/HBASE-14782
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 2.0.0
>            Reporter: Vladimir Rodionov
>            Assignee: Heng Chen
>         Attachments: HBASE-14782.patch
> The issue may affect not only master branch, but previous releases as well.
> This is from one of our customers:
> {quote}
> We are experiencing a problem with the FuzzyRowFilter for HBase scan. We think that it
is a bug. 
> Fuzzy filter should pick a row if it matches filter criteria irrespective of other rows
present in table but filter is dropping a row depending on some other row present in table.

> Details/Step to reproduce/Sample outputs below: 
> Missing row key: \x9C\x00\x044\x00\x00\x00\x00 
> Causing row key: \x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX 
> Prerequisites 
> 1. Create a test table. HBase shell command -- create 'fuzzytest','d' 
> 2. Insert some test data. HBase shell commands: 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x01\x00\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x00\x01\x00",'d:a','junk' 
> • put 'fuzzytest',"\x9C\x00\x044\x00\x01\x00\x01",'d:a','junk' 
> • put 'fuzzytest',"\x9B\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> • put 'fuzzytest',"\x9D\x00\x044e\xBB\xB2\xBB",'d:a','junk' 
> Now when you run the code, you will find \x9C\x00\x044\x00\x00\x00\x00 in output because
it matches filter criteria. (Refer how to run code below) 
> Insert the row key causing bug: 
> HBase shell command: put 'fuzzytest',"\x9C\x00\x03\xE9e\xBB{X\x1Fwts\x1F\x15vRX",'d:a','junk'

> Now when you run the code, you will not find \x9C\x00\x044\x00\x00\x00\x00 in output
even though it still matches filter criteria. 
> {quote}
> Verified the issue on master.

This message was sent by Atlassian JIRA

View raw message