hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lars George <lars.geo...@gmail.com>
Subject SkipFilter Documentation
Date Tue, 29 Mar 2011 10:20:06 GMT
Hi,

Further along the lines of the filters I found this:

 * A wrapper filter that filters an entire row if any of the KeyValue checks do
 * not pass.
 * <p>
 * For example, if all columns in a row represent weights of different things,
 * with the values being the actual weights, and we want to filter out the
 * entire row if any of its weights are zero.  In this case, we want to prevent
 * rows from being emitted if a single key is filtered.  Combine this filter
 * with a {@link ValueFilter}:
 * <p>
 * <pre>
 * scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.EQUAL,
 *     new BinaryComparator(Bytes.toBytes(0))));
 * </code>
 * Any row which contained a column whose value was 0 will be filtered out.
 * Without this filter, the other non-zero valued columns in the row would still
 * be emitted.

from the SkipFilter class. This is not right because the ValueFilter
is using EQUAL. Filters in general are difficult to explain as per the
contract they are assumed to "filter out" details. But the
ComparatorFilter based classes are doing the opposite, they include
matches, when using EQUAL, and exclude when using "NOT_EQUAL". So

new ValueFilter(CompareOp.EQUAL, new BinaryComparator(Bytes.toBytes(0));

actually includes all columns that are containing the value 0, not as
the text says the opposite, i.e. all non-zero valued columns.

The SkipFilter on the other hand is dropping all rows that have a
filterKeyValue() return anything but ReturnCode.INCLUDE for any
included KV. This means we need to check if a column should NOT be
included, and if that is the case the entire row is dropped. The
comparison operator needs to be reversed like so

scan.setFilter(new SkipFilter(new ValueFilter(CompareOp.NOT_EQUAL,
new BinaryComparator(Bytes.toBytes(0))));

This now "includes" all columns, but skips the zero valued ones,
triggering the row filtering in the SkipFilter as expected.

I tried it with:
https://github.com/larsgeorge/hbase-book/blob/master/ch04/src/main/java/SkipFilterExample.java

And got

Adding rows to table...
Results of scan:
KV: row-10/colfam1:col-00/0/Put/vlen=9, Value: val-10.00

Is it just me or is this logic brain dead?

Lars

Mime
View raw message