hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evert Arckens (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3562) ValueFilter is being evaluated before performing the column match
Date Fri, 25 Mar 2011 12:23:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13011175#comment-13011175
] 

Evert Arckens commented on HBASE-3562:
--------------------------------------

In ScanQueryMatcher.match I would do the columns.checkColumn call first and only if that returns
MatchCode.INCLUDE execute the filters. I think this would be more efficient as well since
calculating to skip a column or not will usually be faster than evaluating one ore more filters.

However, in the code is mentioned explicitly : 
/**
 * Filters should be checked before checking column trackers. If we do
 * otherwise, as was previously being done, ColumnTracker may increment its
 * counter for even that KV which may be discarded later on by Filter. This
 * would lead to incorrect results in certain cases.
 */

It is not completely clear to me what the exact purpose of the counter on the ColumnTracker
is or what the problem would be if it was incremented.
Maybe calling ((ExplicitColumnTracker)columns).doneWithColumn (like is done in getNextRowOrNextColumn)
explicitly when a filter skips a column can help here?

> ValueFilter is being evaluated before performing the column match
> -----------------------------------------------------------------
>
>                 Key: HBASE-3562
>                 URL: https://issues.apache.org/jira/browse/HBASE-3562
>             Project: HBase
>          Issue Type: Bug
>          Components: filters
>    Affects Versions: 0.90.0
>            Reporter: Evert Arckens
>
> When performing a Get operation where a both a column is specified and a ValueFilter,
the ValueFilter is evaluated before making the column match as is indicated in the javadoc
of Get.setFilter()  : " {@link Filter#filterKeyValue(KeyValue)} is called AFTER all tests
for ttl, column match, deletes and max versions have been run. "
> The is shown in the little test below, which uses a TestComparator extending a WritableByteArrayComparable.
> public void testFilter() throws Exception {
> 	byte[] cf = Bytes.toBytes("cf");
> 	byte[] row = Bytes.toBytes("row");
> 	byte[] col1 = Bytes.toBytes("col1");
> 	byte[] col2 = Bytes.toBytes("col2");
> 	Put put = new Put(row);
> 	put.add(cf, col1, new byte[]{(byte)1});
> 	put.add(cf, col2, new byte[]{(byte)2});
> 	table.put(put);
> 	Get get = new Get(row);
> 	get.addColumn(cf, col2); // We only want to retrieve col2
> 	TestComparator testComparator = new TestComparator();
> 	Filter filter = new ValueFilter(CompareOp.EQUAL, testComparator);
> 	get.setFilter(filter);
> 	Result result = table.get(get);
> }
> public class TestComparator extends WritableByteArrayComparable {
>     /**
>      * Nullary constructor, for Writable
>      */
>     public TestComparator() {
>         super();
>     }
>     
>     @Override
>     public int compareTo(byte[] theirValue) {
>         if (theirValue[0] == (byte)1) {
>             // If the column match was done before evaluating the filter, we should never
get here.
>             throw new RuntimeException("I only expect (byte)2 in col2, not (byte)1 from
col1");
>         }
>         if (theirValue[0] == (byte)2) {
>             return 0;
>         }
>         else return 1;
>     }
> }
> When only one column should be retrieved, this can be worked around by using a SingleColumnValueFilter
instead of the ValueFilter.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message