hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Ambrose <pambr...@mac.com>
Subject Re: FilterList and SingleColumnValueFilter
Date Fri, 18 Dec 2009 22:42:12 GMT
I am able to add my own version of VersionList without calling 
HbaseObjectWritable.addToMap() and it works like a charm, so I am fine.   
I post the differences after some more testing.

On Dec 18, 2009, at 1:56 PM, Paul Ambrose wrote:

> My earlier suggestion of having SCVF.filterKeyValue() not return INCLUDE
> on column name mismatches was incorrect because INCLUDE is appropriate when SCVF
> is used without FIlterLIst (in the case of MUST_PASS_ONE).  I think the fix is to have
FilterList 
> evaluate all the filters and not bail early when an INCLUDE is found.  I will continue
to play with it.
> 
> On Dec 18, 2009, at 1:45 PM, bmdevelopment wrote:
> 
>> Hi,
>> Yes, I'll be doing testing on FilterLists in my program in the next few weeks, so
will come back with my results afterwards and my recommendations as well. :)
>> Thanks, enjoy the weekend.
>> 
>> stack wrote:
>>> Maybe you two smart fellas can between you make a recommendation and a
>>> patch?
>>> Thanks lads,
>>> St.Ack
>>> On Fri, Dec 18, 2009 at 11:44 AM, bmdevelopment <bmdevelopment@gmail.com>wrote:
>>>> Hi,
>>>> Fyi, I came across similar issues when working on HBASE-1975.
>>>> The return values did not seem to be correct to me either, but when I began
>>>> changing them it seemed to lead to quite involved changes in the SCVF and
>>>> Filter unit tests - something I wanted to avoid.
>>>> In the end, I tried to keep the changes to SCVF as simple as possible.
>>>> At one point, I did also attempt my own version of SCVF and ran into the
>>>> same issue of having to use HbaseObjectWritable.addToMap().
>>>> 
>>>> Now I am beginning to use MUST_PAST_ALL and MUST_PASS_ONE FilterList of
>>>> SCVFs - maybe similar to what Paul is doing in his original mail. So, if
it
>>>> is not working as expected, I will probably need this in the near future
as
>>>> well.
>>>> 
>>>> Thanks
>>>> Jeremiah
>>>> 
>>>> 
>>>> Paul Ambrose wrote:
>>>> 
>>>>> Ugh.  I am afraid not.
>>>>> The two changes that I am advocating (that could break someone else,
which
>>>>> is
>>>>> of course problematic) are:
>>>>> 
>>>>> 1)  SingleColumnValueFilter.filterKeyValue(KeyValue keyValue)
>>>>> When the column name does not match, the return value should be NEXT_ROW,
>>>>> rather than INCLUDE.  As mentioned earlier, when called by FilterList,
>>>>> the INCLUDE return value discontinues further filter evaluation for a
>>>>> given KeyValue
>>>>> in FilterList. That is problematic because matchedColumn is later checked
>>>>> in filterRow
>>>>> and will always be false for unevaluated filters.
>>>>> 
>>>>> 2) FilterList.filterKeyValue(KeyValue v) returns SKIP and I do not know
>>>>> why.
>>>>> In the case of MUST_PASS_ALL, a filter not returning an INCLUDE
>>>>> should result in a NEXT_ROW (not SKIP) being returned, and at the bottom,
>>>>> an INCLUDE should always be returned (rather than a SKIP).
>>>>> 
>>>>> Here is a dumb question.  A while ago, I tried to add my own filter to
the
>>>>> server, but I could not get it going without adding an entry in
>>>>> HbaseObjectWritable.addToMap().  Should I be able to add a filter without
>>>>> this step?  If so, I am content to have my own version of the
>>>>> SingleColumnValueFilter
>>>>> and FilterList and not risk breaking others (though I do think the code
is
>>>>> incorrect).
>>>>> 
>>>>> 
>>>>> 
>>>>> On Dec 17, 2009, at 10:27 AM, stack wrote:
>>>>> 
>>>>> On Tue, Dec 15, 2009 at 10:42 PM, Paul Ambrose <pambrose@mac.com>
wrote:
>>>>>> Hey Michael,
>>>>>>> If hbase-2037 will make it into 0.20.3, I am fine.
>>>>>>> 
>>>>>>> Grand.
>>>>>> Will hbase-2037 fix both issues you describe? (Have you tried it
I
>>>>>> wonder?)
>>>>>> 
>>>>>> St.Ack
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> If not, I would greatly appreciate you breaking it out for 0.20.3.
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>>> Thanks,
>>>>>>> Paul
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Dec 15, 2009, at 10:28 PM, stack wrote:
>>>>>>> 
>>>>>>> Paul:
>>>>>>>> I can apply the fix from hbase-2037... I can break it out
of the posted
>>>>>>>> patch thats up there.  Just say the word.
>>>>>>>> 
>>>>>>>> St.Ack
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Tue, Dec 15, 2009 at 4:17 PM, Ram Kulbak <ram.kulbak@gmail.com>
>>>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>>> Hi Paul,
>>>>>>>>> I've encountered the same problem. I think its fixed
as part of
>>>>>>>>> https://issues.apache.org/jira/browse/HBASE-2037
>>>>>>>>> 
>>>>>>>>> Regards,
>>>>>>>>> Yoram
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On Wed, Dec 16, 2009 at 10:45 AM, Paul Ambrose <pambrose@mac.com>
>>>>>>>>> 
>>>>>>>> wrote:
>>>>>>>> I ran into some problems with FilterList and SingleColumnValueFilter.
>>>>>>>>>> I created a FilterList with MUST_PASS_ONE and two
>>>>>>>>>> 
>>>>>>>>> SingleColumnValueFilters
>>>>>>>>> 
>>>>>>>>>> (each testing equality on a different columns) and
query some trivial
>>>>>>>>>> 
>>>>>>>>> data:
>>>>>>>>> 
>>>>>>>>>> http://pastie.org/744890
>>>>>>>>>> 
>>>>>>>>>> The problem that I encountered were two-fold:
>>>>>>>>>> 
>>>>>>>>>> SingleColumnValueFilter.filterKeyValues() returns
ReturnCode.INCLUDE
>>>>>>>>>> if the column names do not match. If FilterList is
employed, then
>>>>>>>>>> when
>>>>>>>>>> 
>>>>>>>>> the
>>>>>>>>> 
>>>>>>>>>> first Filter returns INCLUDE (because the column
names do not match),
>>>>>>>>>> 
>>>>>>>>> no
>>>>>>>> more filters for that KeyValue are evaluated.  That is problematic
>>>>>>>>> because
>>>>>>>>> 
>>>>>>>>>> when filterRow() is finally called for those filters,
matchedColumn
>>>>>>>>>> is
>>>>>>>>>> never
>>>>>>>>>> found to be true because they were not invoked (due
to FilterList
>>>>>>>>>> 
>>>>>>>>> exiting
>>>>>>>> from
>>>>>>>>>> the filterList iteration when the name mismatched
INCLUDE was
>>>>>>>>>> 
>>>>>>>>> returned).
>>>>>>>> The fix (at least for this scenario) is for
>>>>>>>>>> SingleColumnValueFilter.filterKeyValues() to
>>>>>>>>>> return ReturnCode.NEXT_ROW (rather than INCLUDE).
>>>>>>>>>> 
>>>>>>>>>> The second problem is at the bottom of FilterList.filterKeyValue()
>>>>>>>>>> where ReturnCode.SKIP is returned if MUST_PASS_ONE
is the operator,
>>>>>>>>>> rather than always returning ReturnCode.INCLUDE and
then leaving the
>>>>>>>>>> final filter decision to be made by the call to filterRow().
  I am
>>>>>>>>>> 
>>>>>>>>> sure
>>>>>>>> there is a good
>>>>>>>>>> reason for returning SKIP in other scenarios, but
it is problematic
>>>>>>>>>> in
>>>>>>>>>> mine.
>>>>>>>>>> 
>>>>>>>>>> Feedback would be much appreciated.
>>>>>>>>>> 
>>>>>>>>>> Paul
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>> 
> 


Mime
View raw message