hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S L <slouie.at.w...@gmail.com>
Subject Re: Why doesn't my regex work in hbase rowfilter with my scan?
Date Fri, 14 Jul 2017 00:40:21 GMT
Thanks Ted.  I had other filters in there but wanted to make it simple and
just have one filter for now and then add them one by one until I get
everything working.

So I can't have just one filter in a filter list?  Kind of makes it hard to
debug if I have multiple filters that might be bad (or just one bad and 9
good but can't figure out which is the bad one).

On Thu, Jul 13, 2017 at 5:34 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> rowFilter is added to filter list which doesn't contain other filters.
>
> Maybe the snippet doesn't contain all the code in your class ?
>
> On Thu, Jul 13, 2017 at 5:26 PM, S L <slouie.at.work@gmail.com> wrote:
>
> > I don't understand why my regex doesn't work when scanning hbase.
> > Everything looks good to  me but for some reason, it's returning all keys
> > when it should just return the ones I'm requesting
> >
> > Scan scan = new Scan();
> > scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
> > scan.setCaching(limit);
> > scan.setCacheBlocks(false);
> > scan.setTimeRange(start, end);
> > FilterList filters = new FilterList();
> >     Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new
> > RegexStringComparator("100_.*_\\d{10}"));
> >             filters.addFilter(rowFilter);
> > scan.setFilter(filters);
> >
> > TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
> > Text.class, IntWritable.class, job);
> >
> > The rowkey is stored as a string in hbase.  The rowkey is in the format
> of
> > hash_servername_timestamp, e.g.
> >
> >     0_myserver.mydomain.com_1234567890
> >
> > The hash can be any number from 0-199.  In the above filter, I just want
> to
> > get all elements with hash = 100 but for some reason, the scan job
> appears
> > to return other rowkeys in addition to the ones with hash = 100.
> >
> > I've tried this with jar versions 1.0.1 and 1.2.0-cdh5.7.2.  What am I
> > doing wrong that's making the regex not work?
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message