hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase scan returns inconsistent results on multiple runs for same dataset
Date Wed, 01 Mar 2017 13:33:54 GMT
Which hbase version are you using ?

Does it include HBASE-15378 ?

> On Mar 1, 2017, at 5:02 AM, Hef <hef.online@gmail.com> wrote:
> 
> Hi,
> I'm encountering a strange behavior on MapReduce when using HBase as input
> format. I run my MR tasks on a same table, same dataset, with a same
> pattern of Fuzzy Row Filter, multiple times. The Input Records counters
> shown are not consistent, the smallest number can be 40% less than the
> largest one.
> 
> More specifically,
> - the table is split into 18 regions, distributed on 3 region server. The
> TTL is set to 10 days for the record, though the dataset for MR only
> includes those inserted in 7days.
> 
> - The row key is defined as:
> sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)
> 
> 
> - The scan is created as below:
> 
> Scan scan = new Scan();
> scan.setBatch(100);
> scan.setCaching(10000);
> scan.setCacheBlocks(false);
> scan.setMaxVersions(1);
> 
> 
> And the row filter for the scan is a FuzzyRowFilter that filters only
> events of a given time_of_hour.
> 
> Everything looks fine while the result is out of expect.
> A same task runs 10 times, the Input Records counters  show 6 different
> numbers, and the final output shows 6 different results.
> 
> Does anyone has every faced this problem before?
> What could be the cause of this inconsistency of HBase scan result?
> 
> Thanks

Mime
View raw message