hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From S L <slouie.at.w...@gmail.com>
Subject Why doesn't my regex work in hbase rowfilter with my scan?
Date Fri, 14 Jul 2017 00:26:34 GMT
I don't understand why my regex doesn't work when scanning hbase.
Everything looks good to  me but for some reason, it's returning all keys
when it should just return the ones I'm requesting

Scan scan = new Scan();
scan.addColumn(Bytes.toBytes("raw_data"), Bytes.toBytes(fileType));
scan.setCaching(limit);
scan.setCacheBlocks(false);
scan.setTimeRange(start, end);
FilterList filters = new FilterList();
    Filter rowFilter = new RowFilter(CompareFilter.CompareOp.EQUAL, new
RegexStringComparator("100_.*_\\d{10}"));
            filters.addFilter(rowFilter);
scan.setFilter(filters);

TableMapReduceUtil.initTableMapperJob(tableName, scan, MTTRMapper.class,
Text.class, IntWritable.class, job);

The rowkey is stored as a string in hbase.  The rowkey is in the format of
hash_servername_timestamp, e.g.

    0_myserver.mydomain.com_1234567890

The hash can be any number from 0-199.  In the above filter, I just want to
get all elements with hash = 100 but for some reason, the scan job appears
to return other rowkeys in addition to the ones with hash = 100.

I've tried this with jar versions 1.0.1 and 1.2.0-cdh5.7.2.  What am I
doing wrong that's making the regex not work?

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message