hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Huaxiang Sun <h...@cloudera.com>
Subject Inconsistent behavior of scan with filter and maxVersions?
Date Tue, 28 Feb 2017 01:26:20 GMT
Hi HBase Devs,

    Nicolae Popa found an inconsistent behavior when doing scan with filter, there is maxVersions
configured for column family.
    Start with the example.

hbase(main):001:0> create 't1', {NAME => 'f1', VERSIONS => 1}
hbase(main):002:0> put 't1', 'r1', 'f1:q1', 'a'
hbase(main):003:0> put 't1', 'r1', 'f1:q1', ‘b'

// There are two versions for r1, f1:q1

hbase(main):004:0> scan 't1'
ROW                                                  COLUMN+CELL                         
                                                                                         
                        
 r1                                                  column=f1:q1, timestamp=1488244089712,
value=b                                                                                  
                      
1 row(s)

// Scan with value filter ‘a’, returns the cell for ‘a’, even maxVersions is configured
to be 1
hbase(main):006:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW                                                  COLUMN+CELL                         
                                                                                         
                        
 r1                                                  column=f1:q1, timestamp=1488244087738,
value=a                                                                                  
                      
1 row(s)
hbase(main):007:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW                                                  COLUMN+CELL                         
                                                                                         
                        
 r1                                                  column=f1:q1, timestamp=1488244089712,
value=b                                                                                  
                      
1 row(s)

// After flush and major compaction, the older version is deleted from hfile.
hbase(main):011:0> flush 't1'
hbase(main):012:0> major_compact 't1'
hbase(main):013:0> scan 't1', {FILTER => "ValueFilter(=,'binary:b')"}
ROW                                                  COLUMN+CELL                         
                                                                                         
                        
 r1                                                  column=f1:q1, timestamp=1488244089712,
value=b                                                                                  
                      
1 row(s)

//Scan with value filter ‘a’, returns nothing now.
hbase(main):014:0> scan 't1', {FILTER => "ValueFilter(=,'binary:a')"}
ROW                                                  COLUMN+CELL                         
                                                                                         
                        
0 row(s)
hbase(main):015:0> 

In the above example, the scan result for valueFilter ‘a” is inconsistent across flush
and major compaction. The reason is that when filter returns SKIP, the version count is not
increased. The older version is treated as
the latest version.

Is this the expected behavior? when maxVersions is specified in HCD, is user supposed to see
the latest maxVersions or it could be affected by filters? It is not a raw scan in this example.

Thanks,
Huaxiang Sun
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message