hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Why doesn't my regex work in hbase rowfilter with my scan?
Date Fri, 14 Jul 2017 02:43:41 GMT
I didn't mean that you cannot have only one filter in a filter list.

Please take a look
at hbase-server/src/test/java/org/apache/hadoop/hbase/filter/TestFilter.java
where RegexStringComparator is used.

It may take you less time if you follow the example there and debug your
regex using sample data in cited formation.

Cheers

On Thu, Jul 13, 2017 at 5:40 PM, S L <slouie.at.work@gmail.com> wrote:

> Thanks Ted.  I had other filters in there but wanted to make it simple and
> just have one filter for now and then add them one by one until I get
> everything working.
>
> So I can't have just one filter in a filter list?  Kind of makes it hard to
> debug if I have multiple filters that might be bad (or just one bad and 9
> good but can't figure out which is the bad one).
>
> On Thu, Jul 13, 2017 at 5:34 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>
> > rowFilter is added to filter list which doesn't contain other filters.
> >
> > Maybe the snippet doesn't contain all the code in your class ?
> >
> > On Thu, Jul 13, 2017 at 5:26 PM, S L <slouie.at.work@gmail.com> wrote:
> >
> > > I don't understand why my regex doesn't work when scanning hbase.
> > > Everything looks good to  me but for some reason, it's returning all
> keys
> > > when it should just return the ones I'm requesting
> > >
> > > Scan scan = new Scan();
> > > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > > scan.setCaching(limit);
> > > scan.setCacheBlocks(false);
> > > scan.setTimeRange(start, end);
> > > FilterList filters = new FilterList();
> > >     Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL,
> new
> > > RegexStringComparator("100_.*_\\d{10}"));
> > >             filters.addFilter(rowFilter);
> > > scan.setFilter(filters);
> > >
> > > TableMapReduceUtil.initTableMapperJob(tableName, scan,
> MTTRMapper.class,
> > > Text.class, IntWritable.class, job);
> > >
> > > The rowkey is stored as a string in hbase.  The rowkey is in the format
> > of
> > > hash_servername_timestamp, e.g.
> > >
> > >     0_myserver.mydomain.com_1234567890
> > >
> > > The hash can be any number from 0-199.  In the above filter, I just
> want
> > to
> > > get all elements with hash = 100 but for some reason, the scan job
> > appears
> > > to return other rowkeys in addition to the ones with hash = 100.
> > >
> > > I've tried this with jar versions 1.0.1 and 1.2.0-cdh5.7.2.  What am I
> > > doing wrong that's making the regex not work?
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message