hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: HBase scan returns inconsistent results on multiple runs for same dataset
Date Fri, 03 Mar 2017 04:04:31 GMT
Since cache for ClientScanner might or might not be empty during your test
runs, it was hard to tell whether you hit the bug described by HBASE-15378.

I would suggest you upgrade to a release with HBASE-15378.

On Thu, Mar 2, 2017 at 7:59 PM, Hef <hef.online@gmail.com> wrote:

> Thanks for the hint, which led me to investigate from the client side and
> finally had this problem resolved.
>
> I reviewed the code and found that 1.0.0-cdh5.6.1, an old version of
> hbase-client was used in my project. After updated to 1.2.0-cdh5.9.0,
>  consistent with the one server is running,  my tasks work correctly.
>
> I looked into the source of HBase 1.2.0-cdh5.9.0, HBASE-15378 is not
> patched. And I also went through all release notes from CDH HBase 5.6 to
> 5.9, nothing about this inconsistent scan behavior had been mentioned.
> Though the problem has been resolved for now , I have no idea what the root
> cause  actually is, and whether it will come out again if my dataset grows
> larger, without HBASE-15378.
>
>
>
> On Thu, Mar 2, 2017 at 12:09 AM, Sean Busbey <busbey@apache.org> wrote:
>
> > The place to check for include JIRAs on top of those in the ASF release
> is
> > here:
> >
> > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.
> > 1.releasenotes.html
> >
> > HBASE-15378 is not in CDH5.9.1.
> >
> > On Wed, Mar 1, 2017 at 9:58 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > I don't see it here:
> > >
> > > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.
> > 1.CHANGES.txt?_ga=1.10311413.1914112506.1454459553
> > >
> > > On Wed, Mar 1, 2017 at 5:46 AM, Hef <hef.online@gmail.com> wrote:
> > >
> > >> I'm using CDH 5.9, the document show its HBase version is
> > >> hbase-1.2.0+cdh5.9.1+222.  (
> > >> https://www.cloudera.com/documentation/enterprise/
> > >> release-notes/topics/cdh_vd_cdh_package_tarball_59.html
> > >> )
> > >> I have no idea if  HBASE-15378  is included.
> > >>
> > >> On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >>
> > >> > Which hbase version are you using ?
> > >> >
> > >> > Does it include HBASE-15378 ?
> > >> >
> > >> > > On Mar 1, 2017, at 5:02 AM, Hef <hef.online@gmail.com>
wrote:
> > >> > >
> > >> > > Hi,
> > >> > > I'm encountering a strange behavior on MapReduce when using HBase
> as
> > >> > input
> > >> > > format. I run my MR tasks on a same table, same dataset, with
a
> same
> > >> > > pattern of Fuzzy Row Filter, multiple times. The Input Records
> > counters
> > >> > > shown are not consistent, the smallest number can be 40% less
than
> > the
> > >> > > largest one.
> > >> > >
> > >> > > More specifically,
> > >> > > - the table is split into 18 regions, distributed on 3 region
> > server.
> > >> The
> > >> > > TTL is set to 10 days for the record, though the dataset for
MR
> only
> > >> > > includes those inserted in 7days.
> > >> > >
> > >> > > - The row key is defined as:
> > >> > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)
> > >> > >
> > >> > >
> > >> > > - The scan is created as below:
> > >> > >
> > >> > > Scan scan = new Scan();
> > >> > > scan.setBatch(100);
> > >> > > scan.setCaching(10000);
> > >> > > scan.setCacheBlocks(false);
> > >> > > scan.setMaxVersions(1);
> > >> > >
> > >> > >
> > >> > > And the row filter for the scan is a FuzzyRowFilter that filters
> > only
> > >> > > events of a given time_of_hour.
> > >> > >
> > >> > > Everything looks fine while the result is out of expect.
> > >> > > A same task runs 10 times, the Input Records counters  show 6
> > different
> > >> > > numbers, and the final output shows 6 different results.
> > >> > >
> > >> > > Does anyone has every faced this problem before?
> > >> > > What could be the cause of this inconsistency of HBase scan
> result?
> > >> > >
> > >> > > Thanks
> > >> >
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message