hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: FilterList and SingleColumnValueFilter
Date Fri, 18 Dec 2009 21:00:33 GMT
Maybe you two smart fellas can between you make a recommendation and a
patch?
Thanks lads,
St.Ack

On Fri, Dec 18, 2009 at 11:44 AM, bmdevelopment <bmdevelopment@gmail.com>wrote:

> Hi,
> Fyi, I came across similar issues when working on HBASE-1975.
> The return values did not seem to be correct to me either, but when I began
> changing them it seemed to lead to quite involved changes in the SCVF and
> Filter unit tests - something I wanted to avoid.
> In the end, I tried to keep the changes to SCVF as simple as possible.
> At one point, I did also attempt my own version of SCVF and ran into the
> same issue of having to use HbaseObjectWritable.addToMap().
>
> Now I am beginning to use MUST_PAST_ALL and MUST_PASS_ONE FilterList of
> SCVFs - maybe similar to what Paul is doing in his original mail. So, if it
> is not working as expected, I will probably need this in the near future as
> well.
>
> Thanks
> Jeremiah
>
>
> Paul Ambrose wrote:
>
>> Ugh.  I am afraid not.
>> The two changes that I am advocating (that could break someone else, which
>> is
>> of course problematic) are:
>>
>> 1)  SingleColumnValueFilter.filterKeyValue(KeyValue keyValue)
>> When the column name does not match, the return value should be NEXT_ROW,
>> rather than INCLUDE.  As mentioned earlier, when called by FilterList,
>> the INCLUDE return value discontinues further filter evaluation for a
>> given KeyValue
>> in FilterList. That is problematic because matchedColumn is later checked
>> in filterRow
>> and will always be false for unevaluated filters.
>>
>> 2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know
>> why.
>> In the case of MUST_PASS_ALL, a filter not returning an INCLUDE
>> should result in a NEXT_ROW (not SKIP) being returned, and at the bottom,
>> an INCLUDE should always be returned (rather than a SKIP).
>>
>> Here is a dumb question.  A while ago, I tried to add my own filter to the
>> server, but I could not get it going without adding an entry in
>> HbaseObjectWritable.addToMap().  Should I be able to add a filter without
>> this step?  If so, I am content to have my own version of the
>> SingleColumnValueFilter
>> and FilterList and not risk breaking others (though I do think the code is
>> incorrect).
>>
>>
>>
>> On Dec 17, 2009, at 10:27 AM, stack wrote:
>>
>>  On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <pambrose@mac.com> wrote:
>>>
>>>  Hey Michael,
>>>>
>>>> If hbase-2037 will make it into 0.20.3, I am fine.
>>>>
>>>>  Grand.
>>>
>>> Will hbase-2037 fix both issues you describe? (Have you tried it I
>>> wonder?)
>>>
>>> St.Ack
>>>
>>>
>>>
>>>  If not, I would greatly appreciate you breaking it out for 0.20.3.
>>>>
>>>>
>>>>
>>>
>>>
>>>  Thanks,
>>>> Paul
>>>>
>>>>
>>>>
>>>> On Dec 15, 2009, at 10:28 PM, stack wrote:
>>>>
>>>>  Paul:
>>>>>
>>>>> I can apply the fix from hbase-2037... I can break it out of the posted
>>>>> patch thats up there.  Just say the word.
>>>>>
>>>>> St.Ack
>>>>>
>>>>>
>>>>> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <ram.kulbak@gmail.com>
>>>>>
>>>> wrote:
>>>>
>>>>> Hi Paul,
>>>>>>
>>>>>> I've encountered the same problem. I think its fixed as part of
>>>>>> https://issues.apache.org/jira/browse/HBASE-2037
>>>>>>
>>>>>> Regards,
>>>>>> Yoram
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <pambrose@mac.com>
>>>>>>
>>>>> wrote:
>>>>
>>>>> I ran into some problems with FilterList and SingleColumnValueFilter.
>>>>>>>
>>>>>>> I created a FilterList with MUST_PASS_ONE and two
>>>>>>>
>>>>>> SingleColumnValueFilters
>>>>>>
>>>>>>> (each testing equality on a different columns) and query some
trivial
>>>>>>>
>>>>>> data:
>>>>>>
>>>>>>> http://pastie.org/744890
>>>>>>>
>>>>>>> The problem that I encountered were two-fold:
>>>>>>>
>>>>>>> SingleColumnValueFilter.filterKeyValues() returns ReturnCode.INCLUDE
>>>>>>> if the column names do not match. If FilterList is employed,
then
>>>>>>> when
>>>>>>>
>>>>>> the
>>>>>>
>>>>>>> first Filter returns INCLUDE (because the column names do not
match),
>>>>>>>
>>>>>> no
>>>>
>>>>> more filters for that KeyValue are evaluated.  That is problematic
>>>>>>>
>>>>>> because
>>>>>>
>>>>>>> when filterRow() is finally called for those filters, matchedColumn
>>>>>>> is
>>>>>>> never
>>>>>>> found to be true because they were not invoked (due to FilterList
>>>>>>>
>>>>>> exiting
>>>>
>>>>> from
>>>>>>> the filterList iteration when the name mismatched INCLUDE was
>>>>>>>
>>>>>> returned).
>>>>
>>>>> The fix (at least for this scenario) is for
>>>>>>> SingleColumnValueFilter.filterKeyValues() to
>>>>>>> return ReturnCode.NEXT_ROW (rather than INCLUDE).
>>>>>>>
>>>>>>> The second problem is at the bottom of FilterList.filterKeyValue()
>>>>>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE is the operator,
>>>>>>> rather than always returning ReturnCode.INCLUDE and then leaving
the
>>>>>>> final filter decision to be made by the call to filterRow().
  I am
>>>>>>>
>>>>>> sure
>>>>
>>>>> there is a good
>>>>>>> reason for returning SKIP in other scenarios, but it is problematic
>>>>>>> in
>>>>>>> mine.
>>>>>>>
>>>>>>> Feedback would be much appreciated.
>>>>>>>
>>>>>>> Paul
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message