hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: FuzzyRowFilter missing keys
Date Thu, 13 Mar 2014 16:22:31 GMT
Looking at TestFuzzyRowFilter.java, there're several cases where mask
starts with 1.

It would be much easier to diagnose if you come up with a unit test.

Take a look at TestHRegionServerBulkLoad.java since you mentioned you're using
bulk load in the cluster.

Cheers


On Thu, Mar 13, 2014 at 1:48 AM, Amit Sela <amits@infolinks.com> wrote:

> On the same tables I get missing row keys for a mask in the prefix, if I
> mask the second part of the key like this:
> 201401\x00\x00\x00\x00\x00_product1___
> and fuzzy info:
> {0,0,0,0,0,0,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0}
> It seems to work....
>
> Anyone encounter issues with masking the prefix ? Seems odd because the
> Sematext example for using FuzzyRowFilter talks about masking the prefix...
>
>
>
> On Tue, Mar 11, 2014 at 11:07 AM, Amit Sela <amits@infolinks.com> wrote:
>
> > I can't seem to reproduce in unit test.
> > The main difference is that I'm using bulk load in the cluster and Put
> API
> > in the unit test.
> >
> >
> >
> > On Mon, Mar 10, 2014 at 4:47 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> >
> >> Amit:
> >> Can you put your scenario in a unit test so that it is easier to
> pinpoint
> >> where the issue is ?
> >>
> >> Thanks
> >>
> >>
> >> On Mon, Mar 10, 2014 at 5:25 AM, Amit Sela <amits@infolinks.com> wrote:
> >>
> >> > My table contains keys of this kind over an entire month but the scan
> >> > returns only for a some fo the days.
> >> > I have 2010101-20140131 but the scan returns only for:
> >> > 20140104, 20140110, 20140111, 20140118, 20140120, 20140125, 20140128
> >> >
> >> > Using get or scan with no fuzzy filter works...
> >> >
> >> >
> >> > On Mon, Mar 10, 2014 at 1:59 PM, Bharath Vissapragada <
> >> > bharathv@cloudera.com
> >> > > wrote:
> >> >
> >> > > Is it because you fixed "_US_product1___" part of the key?  From
> your
> >> > > definition of filter you should get as output all keys of form
> >> > > "yyyyMMdd_US_product1___".
> >> > > can you share a key thats of this format and missing in the output?
> >> > >
> >> > >
> >> > > On Mon, Mar 10, 2014 at 3:38 PM, Amit Sela <amits@infolinks.com>
> >> wrote:
> >> > >
> >> > > > Hi all,
> >> > > > I'm using HBase 0.94.12 + Hadoop 1.0.4.
> >> > > > Trying to use FuzzyRowFilter looks like it's missing keys in
the
> >> scan.
> >> > > >
> >> > > > Row key structure:
> >> > > > yyyyMMdd_Country_Product_Category1_Category2_
> >> > > > Where the date is mandatory and all other fields may be "".
> >> > > > Examples:
> >> > > > 20140101_US_product1___
> >> > > > 20140102__product1_bla__
> >> > > > 20140103_____
> >> > > >
> >> > > > Supplying the filter with row key:
> >> > > > \x00\x00\x00\x00\x00\x00\x00\x00_US_product1___
> >> > > > and fuzzy info:
> >> > > > {1,1,1,1,1,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0}
> >> > > >
> >> > > > Over a range of a month, although the key exists for every day
in
> >> the
> >> > > > month, I get result only for some of the days.
> >> > > >
> >> > > > I tried it on another table and the same happens, I'll mention
> that
> >> > both
> >> > > > tables have keys that start with yyyyMMdd.
> >> > > >
> >> > > > Anyone had a similar issue before ? I saw something in the mailing
> >> list
> >> > > > archives but no results there...
> >> > > >
> >> > > > Thanks,
> >> > > > Amit.
> >> > > >
> >> > >
> >> > >
> >> > >
> >> > > --
> >> > > Bharath Vissapragada
> >> > > <http://www.cloudera.com>
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message