hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From NNever <nnever...@gmail.com>
Subject Re: Regarding order of filters in FilterList
Date Tue, 05 Jun 2012 06:43:39 GMT
Hi Anoop,

I'm an ordinary HBase user.
I think differ order of Filter in FilterList lead to different results is
acceptable.
Filter1 fitler out some KVs/rows with an option, then Filter2 comes, it
filters out some of the left KVs/rows.
Another situation. Filter2 comes first, some KVs/rows kickout,  then
Filter1 may not see may KVs/rows he should have to see, so different
results come out.

As a user, I naturally thought Filters in FilterList will works one by one,
it nothing bad. Or If you want to add more documents on this or other
fitures on HBase, it's fully welcomed. I think there is still not enough
materials for our new-leaners on HBase...

Best Regards,
NN

2012/6/5 Anoop Sam John <anoopsj@huawei.com>

> Hi All
>               One thing came while going through Filter code
>
> Suppose I am using a FilterList along with my Scan. The list contains one
> PageFilter(max pages=N) and one SingleColumnValueFilter.[One filter checks
> a col value and other deals with number of rows in result] So as a user
> what I expect out of this usage is to get N number of rows where colval=X
> Now if I create my FilterList like below things would work fine
>
> FilterList list = new FilterList();
> SingleColumnValueFilter f = new SingleColumnValueFilter(..)
> f.setLatestVersionOnly(false);
> list.add( f);
> list.add( new PageFilter(..));
>
> Just use the code with slight diff in the order in which the filters are
> added
> FilterList list = new FilterList();
> list.add( new PageFilter(..));
> SingleColumnValueFilter f = new SingleColumnValueFilter(..)
> f.setLatestVersionOnly(false);
> list.add( f);
>
>
> Being a user I would expect to get the same result. But it may not be.
> This even can return me empty results also.
> Here as the filter which deals with the number of returned rows coming
> 1st. So even if the second filter might filter out one KV or row later,
> this number tracked within the filter getting incremented. [1st filter
> never gets a chance to rollback the operation when the next filter filters
> out the row/KV]
>
> If we use like the 1st way in which the filters dealing with the number of
> rows or KVs as the last items in the FilterList, there wont be any problem.
> Do some one feel this as an issue with our filter framework and FilterList
> ?  Atleast we should document this clearly I think. Pls give your
> suggestion.
>
> Note : In the above shown code sample if we are going with latet version
> only=true for SingleColumnValueFilter , even the second scenario also will
> be working fine. Here the diff would be skipping one row will be handled by
> the filterRowKey() itself and filterRow() wont get called for the ignored
> rowkey.[PageFilter deals with filterRow()]
> So it all depends on which of the APIs the filter is using to decide to
> include or skip or seek to rows.
>
>
> Also there is  one issues opened regarding the FilterList  HBASE-6132
>  Pls give your valuable thoughts and suggestions
>
>
> -Anoop-
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message