lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shai Erera (JIRA)" <>
Subject [jira] Commented: (LUCENE-1518) Merge Query and Filter classes
Date Thu, 30 Apr 2009 11:54:30 GMT


Shai Erera commented on LUCENE-1518:

bq. Wrapping with CSQ is just adding anothe layer between Lucene search machinery and Filter,
making these optimizations harder.

Right. But making Filter sub-class Query and check in BQ 'if (query instanceof Filter) { Filter
f = (Filter) query)' is not going to improve anything. It adds instanceof and casting, and
I'd think those are more expensive than wrapping a Filter with CSQ and returning an appropriate
Scorer, which will use the Filter in its next() and skipTo() calls.

bq. On the other hand, I must accept, conceptually FIter and Query are "the same", supporting
together following options

I think that if we allow BooleanClause to implement a Weight(IndexReader) (just like Query)
we'll be one more step closer to that goal? BQ uses this method to construct BooleanWeight,
only today it calls clause.getQuery().createWeight(). Instead it could do clause.getWeight,
and if the BooleanClause holds a Filter it will return a FilterWeight, otherwise delegate
that call to the contained Query.

Regarding pure ranked, CSQ is really what we need, no?

So how about the following:
# Add add(Filter, Occur) to BooleanClause.
# Add weight(Searcher) to BooleanClause.
# Create a FilterWeight which wraps a Filter and provide a Scorer implementation with a constant
score. (This does not handle the "no scoring" mode, unless "no scoring" can be achieved with
score=0.0f, while constant is any other value, defaulting to 1.0f).
# Add isRandomAccess to Filter.
# Create a RandomAccessFilter which extends Filter and defines an additional seek(target)
# Add asRandomAccessFilter() to Filter, which will materialize that Filter into memory, or
into another RandomAccess data structure (e.g. keeping it on disk but still provide random
access to it, even if not very efficient) and return a RandomAccessFilter type, which will
implement seek(target) and possibly override next() and skipTo(), but still use whatever other
methods this Filter declares.
#* I think we should default it to throw UOE providing that we document that isRandomAccess
should first be called.

I'm thinking out loud just like you, so I hope my stuff makes sense :).

> Merge Query and Filter classes
> ------------------------------
>                 Key: LUCENE-1518
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Search
>    Affects Versions: 2.4
>            Reporter: Uwe Schindler
>             Fix For: 2.9
>         Attachments: LUCENE-1518.patch
> This issue presents a patch, that merges Queries and Filters in a way, that the new Filter
class extends Query. This would make it possible, to use every filter as a query.
> The new abstract filter class would contain all methods of ConstantScoreQuery, deprecate
ConstantScoreQuery. If somebody implements the Filter's getDocIdSet()/bits() methods he has
nothing more to do, he could just use the filter as a normal query.
> I do not want to completely convert Filters to ConstantScoreQueries. The idea is to combine
Queries and Filters in such a way, that every Filter can automatically be used at all places
where a Query can be used (e.g. also alone a search query without any other constraint). For
that, the abstract Query methods must be implemented and return a "default" weight for Filters
which is the current ConstantScore Logic. If the filter is used as a real filter (where the
API wants a Filter), the getDocIdSet part could be directly used, the weight is useless (as
it is currently, too). The constant score default implementation is only used when the Filter
is used as a Query (e.g. as direct parameter to For the special case of
BooleanQueries combining Filters and Queries the idea is, to optimize the BooleanQuery logic
in such a way, that it detects if a BooleanClause is a Filter (using instanceof) and then
directly uses the Filter API and not take the burden of the ConstantScoreQuery (see LUCENE-1345).
> Here some ideas how to implement with Query and Filter:
> - User runs using a Filter as the only parameter. As every Filter is
also a ConstantScoreQuery, the query can be executed and returns score 1.0 for all matching
> - User runs using a Query as the only parameter: No change, all is
the same as before
> - User runs using a BooleanQuery as parameter: If the BooleanQuery
does not contain a Query that is subclass of Filter (the new Filter) everything as usual.
If the BooleanQuery only contains exactly one Filter and nothing else the Filter is used as
a constant score query. If BooleanQuery contains clauses with Queries and Filters the new
algorithm could be used: The queries are executed and the results filtered with the filters.
> For the user this has the main advantage: That he can construct his query using a simplified
API without thinking about Filters oder Queries, you can just combine clauses together. The
scorer/weight logic then identifies the cases to use the filter or the query weight API. Just
like the query optimizer of a RDB.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message