lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5495) Boolean Filter does not handle FilterClauses with only bits() implemented
Date Mon, 10 Mar 2014 21:50:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926281#comment-13926281
] 

Uwe Schindler commented on LUCENE-5495:
---------------------------------------

bq. In FilteredQuery, depending on the FilterStrategy, iterator() checked/called after bits().
I think if bits() is optional, and iterator() is not, then checking bits() first actually
does make sense, otherwise, since iterator() impl is mandatory, bits() impl would be ignored.
Maybe I am missing something...

- RandomAccessFilterStrategy first calls iterator() and cancels collection if iterator is
null. Then it asks for bits() and uses them, if available, otherwise falls back to good old
LeapFrog approach. RandomAccessFilterStrategy only chooses to use random access, if the useRandomAccess()
method returns true. In general it only does this if the iterator is not too sparse (it checks
first filter doc). This can be configured by the user with other heuristics (e.g. cost() function).
- LeapFrogFilterStrategy always uses iterator() (it's scorer is also the fall back for all
other cases, where bits() returns null)
- QueryFirstFilterStrategy first calls bits(), but if those are null it falls back to the
iterator()

The default is RandomAccessFilterStrategy.

bq. This patch can be think of fixing a shortcoming in BooleanFilter. Are we discouraging
use of BooleanFiilter?

I just complained about the algo. bits() must be purely optional. If it returns null, you
*must* also check the iterator(). If the iterator() also returns null, no documents match.

But your patch should in no case try to "emulate" the iterator by the BitsDocIdSetIterator!
iterator() is mandatory and is used as fallback if the bits() return null. But definitely
not the other way round. iterator() has to be implemented, otherwise its not a valid filter.

The UOE by the Facet filters is intentional, because those should never ever be used as a
filter in queries. Because of the way how FilteredQuery or ConstantScoreQuery works, the user
will get the UOE. I know this is a hack, but [~mikemccand] did this intentionally (earlier
versions used the same scorer you added in your patch to emulate, but that made users use
that filters and slow down their queries).

In my opinion, if we want to improve this, it should use a strategy like FilteredQuery, too.
We can improve maybe depending on cases like:
- all filters are AND'ed together and some support random access -> use the same approach
like FilteredQuery and pass down the bits of those filter's output as acceptDocs input for
the next filter
- some flters support random access, few of them only iterators. In that case the filter with
the iterator could drive the query and use bits() of other filters to filter out some documents.
The result can be handled as FixedBitSet, but improved that bits() is used partially (if available).
- many more strategies...

But on the other hand, as Mike says: Filters should be Queries without score and be handled
by BooleanQuery, BooleanFilter will go away then (or subsumed inside some BooleanQuery optimizations
choosen automatically depending on cost factors,...). We have a GSoC issue open. This is one
of the big priorities fixing the broken and unintuitive APIs around Query+Filter.

> Boolean Filter does not handle FilterClauses with only bits() implemented
> -------------------------------------------------------------------------
>
>                 Key: LUCENE-5495
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5495
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 4.6.1
>            Reporter: John Wang
>         Attachments: LUCENE-5495.patch, LUCENE-5495.patch
>
>
> Some Filter implementations produce DocIdSets without the iterator() implementation,
such as o.a.l.facet.range.Range.getFilter().
> Currently, such filters cannot be added to a BooleanFilter because BooleanFilter expects
all FilterClauses with Filters that have iterator() implemented.
> This patch improves the behavior by taking Filters with bits() implemented and treat
them separately.
> This behavior would be faster in the case for Filters with a forward index as the underlying
data structure, where there would be no need to scan the index to build an iterator.
> See attached unit test, which fails without this patch.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message