hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hef <hef.onl...@gmail.com>
Subject HBase scan returns inconsistent results on multiple runs for same dataset
Date Wed, 01 Mar 2017 13:02:04 GMT
I'm encountering a strange behavior on MapReduce when using HBase as input
format. I run my MR tasks on a same table, same dataset, with a same
pattern of Fuzzy Row Filter, multiple times. The Input Records counters
shown are not consistent, the smallest number can be 40% less than the
largest one.

More specifically,
- the table is split into 18 regions, distributed on 3 region server. The
TTL is set to 10 days for the record, though the dataset for MR only
includes those inserted in 7days.

- The row key is defined as:
sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)

- The scan is created as below:

Scan scan = new Scan();

And the row filter for the scan is a FuzzyRowFilter that filters only
events of a given time_of_hour.

Everything looks fine while the result is out of expect.
A same task runs 10 times, the Input Records counters  show 6 different
numbers, and the final output shows 6 different results.

Does anyone has every faced this problem before?
What could be the cause of this inconsistency of HBase scan result?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message