hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sean Busbey <bus...@apache.org>
Subject Re: HBase scan returns inconsistent results on multiple runs for same dataset
Date Wed, 01 Mar 2017 16:09:55 GMT
The place to check for include JIRAs on top of those in the ASF release is here:

http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.1.releasenotes.html

HBASE-15378 is not in CDH5.9.1.

On Wed, Mar 1, 2017 at 9:58 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> I don't see it here:
>
> http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.1.CHANGES.txt?_ga=1.10311413.1914112506.1454459553
>
> On Wed, Mar 1, 2017 at 5:46 AM, Hef <hef.online@gmail.com> wrote:
>
>> I'm using CDH 5.9, the document show its HBase version is
>> hbase-1.2.0+cdh5.9.1+222.  (
>> https://www.cloudera.com/documentation/enterprise/
>> release-notes/topics/cdh_vd_cdh_package_tarball_59.html
>> )
>> I have no idea if  HBASE-15378  is included.
>>
>> On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>> > Which hbase version are you using ?
>> >
>> > Does it include HBASE-15378 ?
>> >
>> > > On Mar 1, 2017, at 5:02 AM, Hef <hef.online@gmail.com> wrote:
>> > >
>> > > Hi,
>> > > I'm encountering a strange behavior on MapReduce when using HBase as
>> > input
>> > > format. I run my MR tasks on a same table, same dataset, with a same
>> > > pattern of Fuzzy Row Filter, multiple times. The Input Records counters
>> > > shown are not consistent, the smallest number can be 40% less than the
>> > > largest one.
>> > >
>> > > More specifically,
>> > > - the table is split into 18 regions, distributed on 3 region server.
>> The
>> > > TTL is set to 10 days for the record, though the dataset for MR only
>> > > includes those inserted in 7days.
>> > >
>> > > - The row key is defined as:
>> > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)
>> > >
>> > >
>> > > - The scan is created as below:
>> > >
>> > > Scan scan = new Scan();
>> > > scan.setBatch(100);
>> > > scan.setCaching(10000);
>> > > scan.setCacheBlocks(false);
>> > > scan.setMaxVersions(1);
>> > >
>> > >
>> > > And the row filter for the scan is a FuzzyRowFilter that filters only
>> > > events of a given time_of_hour.
>> > >
>> > > Everything looks fine while the result is out of expect.
>> > > A same task runs 10 times, the Input Records counters  show 6 different
>> > > numbers, and the final output shows 6 different results.
>> > >
>> > > Does anyone has every faced this problem before?
>> > > What could be the cause of this inconsistency of HBase scan result?
>> > >
>> > > Thanks
>> >
>>

Mime
View raw message