lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-4410) Make FilteredQuery more flexible with regards to how filters are applied
Date Thu, 20 Sep 2012 14:37:09 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-4410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459627#comment-13459627
] 

Uwe Schindler commented on LUCENE-4410:
---------------------------------------

Hi,
I have one more comment for the DocFirst strategy:
The idea is good because it lets the query drive the collector and we only look up docs, the
query hi (using the random access filter). This is sometimes better than passing it down as
acceptdocs, because it would slowdown if the Bits interface is expensive and every query subclause
must reevaluate the bits.get() method.
The problem I had with trhe patch is the crazy Bits implementation for the DocFirstStrategy,
which had exactly this problem. Also it was not following the random access pattern, because
it allowed the Bits.get() calls only in order. I can easily write a BooleanScorer1-like query
that violates this (because a query with more than one sub-clause can easily call Bits.get()
out of order for each sub-clause).
The DocFirstStrategy wants the query drive the collection, so the non-bits approach should
either use LeapFrog (which may be expensive if the filter has ineffective nextDoc()) or it
should also implemen DocFirst in order. I would rename that strategy to QueryFirstStrategy
and make 2 scorers for it:
- a random access one calling Bits.get() for every hit of the query
- a sequential one that calls nextDoc() only on the Query, never on the filter. The filter
is only advanced to the current query doc. By this the filter oly scans through its docs very
seldom (when there is no hit after advance).
                
> Make FilteredQuery more flexible with regards to how filters are applied
> ------------------------------------------------------------------------
>
>                 Key: LUCENE-4410
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4410
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search
>    Affects Versions: 4.0-BETA
>            Reporter: Simon Willnauer
>             Fix For: 5.0, 4.0
>
>         Attachments: LUCENE-4410.patch
>
>
> Currently FilteredQuery uses either the "old" lucene 3 leap frog approach or pushes the
filter down together with accepted docs. Yet there might be more strategies required to fit
common usecases like geo-filtering where a rather costly function is applied to each document.
Using leap frog this might result in a very slow query if the filter is advanced since it
might have linear running time to find the next valid document. We should be more flexible
with regards to those usecases and make it possible to either tell FQ what to do or plug in
a strategy that applied a filter in a different way.
> The current FQ impl also uses an heuristic to decide if RA or LeapFrog should be used.
This is really an implementation detail of the strategy and not of FQ and should be moved
out.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message