lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject RE: Why Lucene has to rewrite queries prior to actual searching?
Date Thu, 10 Apr 2008 17:32:38 GMT

: term from the query, this part of Lucene should be smarter to know how to
: handle wildcards or even regex, so if "foo*" is received from the query, it
: will start with retieving the TermInfo for just "foo", and then will
: continue and add up more and more TermInfo structure to its cache (or
: whatever else it does with them) until the pattern will no longer match. The

This is what rewriting does.

The crux of this thread basicly seems to by "why are two passes made" why 
do things like WildCardQuery and PrefixQuery have a rewrite method that 
does one pass to build up a list of terms, and then have Scorers that do a 
second pass of all documents that match those rewriten queries.

The answer is scoring: The vby rewriting things like WildCardQueries into 
TermQueries, the resulting queries can take into account the tf and idf of 
the individual terms.

An alternative like what you describe is in fact posisble with 
ConstantScoreQuery, and is exactly how ConstantScoreRangeQuery works - The 
Terms in the range are iterated exactly once and the docs are recorded, 
but those documents are all scored equally.  You can do the samething with 
any Filter -- Solr uses the PrefixFilter to do this instead of using 
PrefixQueries, it would also be fairly easy to write a generic Filter that 
could take in any TermEnum (so you could pass it a FilteredTermEnum like     
FuzzyTermEnum, RegexTermEnum, WildcardTermEnum, etc...)

: Obviously, queries sent to this searcher will have to pass a lighter
: QueryParser, so they keep their wildcards.

no, the QueryParser would just need to return instances of queries that 
are (or rewrite to) ConstantScoreQuery.

: This has obvious speed, accuracy, and simplicity gain. As I mentioned

Your "accuracy" claim is highly questionable - it would be as accurate 
from a "matching" standpoint, but most of hte useful scoring information 
would be completley lost.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message