hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hef <hef.onl...@gmail.com>
Subject Re: HBase scan returns inconsistent results on multiple runs for same dataset
Date Fri, 03 Mar 2017 04:21:08 GMT
I ran the tests with following scenarios:
1. ran tasks with old client 5 times, and got 'mapping input records'
counters with 5 different values, varied from 470k ~ 630k
2. ran tasks with new client 5 times, got only 1 value, much larger than
any value from step 1, which was  2.6m
3. RegionServers were not restarted during tests
4. Scan criteria was consistent during tests



On Fri, Mar 3, 2017 at 12:04 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Since cache for ClientScanner might or might not be empty during your test
> runs, it was hard to tell whether you hit the bug described by HBASE-15378.
>
> I would suggest you upgrade to a release with HBASE-15378.
>
> On Thu, Mar 2, 2017 at 7:59 PM, Hef <hef.online@gmail.com> wrote:
>
> > Thanks for the hint, which led me to investigate from the client side and
> > finally had this problem resolved.
> >
> > I reviewed the code and found that 1.0.0-cdh5.6.1, an old version of
> > hbase-client was used in my project. After updated to 1.2.0-cdh5.9.0,
> >  consistent with the one server is running,  my tasks work correctly.
> >
> > I looked into the source of HBase 1.2.0-cdh5.9.0, HBASE-15378 is not
> > patched. And I also went through all release notes from CDH HBase 5.6 to
> > 5.9, nothing about this inconsistent scan behavior had been mentioned.
> > Though the problem has been resolved for now , I have no idea what the
> root
> > cause  actually is, and whether it will come out again if my dataset
> grows
> > larger, without HBASE-15378.
> >
> >
> >
> > On Thu, Mar 2, 2017 at 12:09 AM, Sean Busbey <busbey@apache.org> wrote:
> >
> > > The place to check for include JIRAs on top of those in the ASF release
> > is
> > > here:
> > >
> > > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.
> > > 1.releasenotes.html
> > >
> > > HBASE-15378 is not in CDH5.9.1.
> > >
> > > On Wed, Mar 1, 2017 at 9:58 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > > > I don't see it here:
> > > >
> > > > http://archive.cloudera.com/cdh5/cdh/5/hbase-1.2.0-cdh5.9.
> > > 1.CHANGES.txt?_ga=1.10311413.1914112506.1454459553
> > > >
> > > > On Wed, Mar 1, 2017 at 5:46 AM, Hef <hef.online@gmail.com> wrote:
> > > >
> > > >> I'm using CDH 5.9, the document show its HBase version is
> > > >> hbase-1.2.0+cdh5.9.1+222.  (
> > > >> https://www.cloudera.com/documentation/enterprise/
> > > >> release-notes/topics/cdh_vd_cdh_package_tarball_59.html
> > > >> )
> > > >> I have no idea if  HBASE-15378  is included.
> > > >>
> > > >> On Wed, Mar 1, 2017 at 9:33 PM, Ted Yu <yuzhihong@gmail.com>
wrote:
> > > >>
> > > >> > Which hbase version are you using ?
> > > >> >
> > > >> > Does it include HBASE-15378 ?
> > > >> >
> > > >> > > On Mar 1, 2017, at 5:02 AM, Hef <hef.online@gmail.com>
wrote:
> > > >> > >
> > > >> > > Hi,
> > > >> > > I'm encountering a strange behavior on MapReduce when using
> HBase
> > as
> > > >> > input
> > > >> > > format. I run my MR tasks on a same table, same dataset,
with a
> > same
> > > >> > > pattern of Fuzzy Row Filter, multiple times. The Input Records
> > > counters
> > > >> > > shown are not consistent, the smallest number can be 40%
less
> than
> > > the
> > > >> > > largest one.
> > > >> > >
> > > >> > > More specifically,
> > > >> > > - the table is split into 18 regions, distributed on 3 region
> > > server.
> > > >> The
> > > >> > > TTL is set to 10 days for the record, though the dataset
for MR
> > only
> > > >> > > includes those inserted in 7days.
> > > >> > >
> > > >> > > - The row key is defined as:
> > > >> > > sault(1byte) + time_of_hour(4bytes) + uuid(36bytes)
> > > >> > >
> > > >> > >
> > > >> > > - The scan is created as below:
> > > >> > >
> > > >> > > Scan scan = new Scan();
> > > >> > > scan.setBatch(100);
> > > >> > > scan.setCaching(10000);
> > > >> > > scan.setCacheBlocks(false);
> > > >> > > scan.setMaxVersions(1);
> > > >> > >
> > > >> > >
> > > >> > > And the row filter for the scan is a FuzzyRowFilter that
filters
> > > only
> > > >> > > events of a given time_of_hour.
> > > >> > >
> > > >> > > Everything looks fine while the result is out of expect.
> > > >> > > A same task runs 10 times, the Input Records counters  show
6
> > > different
> > > >> > > numbers, and the final output shows 6 different results.
> > > >> > >
> > > >> > > Does anyone has every faced this problem before?
> > > >> > > What could be the cause of this inconsistency of HBase scan
> > result?
> > > >> > >
> > > >> > > Thanks
> > > >> >
> > > >>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message