lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Itamar Syn-Hershko" <ita...@divrei-tora.com>
Subject RE: Why Lucene has to rewrite queries prior to actual searching?
Date Mon, 07 Apr 2008 22:34:48 GMT
Paul and John,

Thanks for your quick reply.

The problem with query rewriting is the beforementioned MaxClauseException.
Instead of inflating the query and passing a deterministic list of terms to
the actual search routine, Lucene could have accessed the vectors in the
index using some sort of filter. So, for example, if it knows to access
"Foobar" by its name in the index, why can't it take "Foo*" and just get all
the vectors until "Fop" is met (for example). Why does it have to get
deterministic list of terms?

I will take a look at the Scorer - can you describe in short what exactly it
does and where and when it is being called?

I don't get John's comment though - Query::rewrite is being called prior to
the actual searching (through QueryParser), how come it can use "information
gathered from IndexReader at search time"?

Itamar.

-----Original Message-----
From: Paul Elschot [mailto:paul.elschot@xs4all.nl] 
Sent: Tuesday, April 08, 2008 12:57 AM
To: java-user@lucene.apache.org
Subject: Re: Why Lucene has to rewrite queries prior to actual searching?

Itamar,

Query rewrite replaces wildcards with terms available from the index.
Usually that involves replacing a wildcard with a BooleanQuery that is an
effective OR over the available terms while using a flat coordination
factor, i.e. it does not matter how many of the available terms actually
match a document, as long as at least one matches.

For the required query parts (AND like), Scorer.skipTo() is used, and that
could well be the filter mechanism you are referring to; have a look at the
javadocs of Scorer, and, if necessary, at the actual code of
ConjunctionScorer.

Regards,
Paul Elschot





Op Monday 07 April 2008 23:13:09 schreef Itamar Syn-Hershko:
> Hi all,
>
> Can someone from the experts here explain why Lucene has to get a 
> "rewritten" query for the Searcher - so Phrase or Wildcards queries 
> have to rewrite themselves into a "primitive" query, that is then 
> passed to Lucene to look for? I'm probably not familiar too much with 
> the internals of Lucene, but I'd imagine that if you can inflate a 
> query using wildcards via xxxxQuery sub classing, you could as easily
> (?) have some sort of Filter mechanism during the search, so that 
> Lucene retrieves the Position vectors for all the terms that pass that 
> filter, instead of retrieving only the position data for deterministic 
> terms (with no wildcards etc.). If that was possible to do somehow, it 
> could greatly increase the searchability of Lucene indices by using 
> RegEx (without re-writing and getting the dreaded MaxClauseCount 
> error) and similar.
>
> Would love to hear some insights on this one.
>
> Itamar.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message