lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: Why Lucene has to rewrite queries prior to actual searching?
Date Mon, 07 Apr 2008 22:55:40 GMT
Itamar,

Have a look here:
http://lucene.apache.org/java/2_3_1/scoring.html

Regards,
Paul Elschot

Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko:
> Paul and John,
>
> Thanks for your quick reply.
>
> The problem with query rewriting is the beforementioned
> MaxClauseException. Instead of inflating the query and passing a
> deterministic list of terms to the actual search routine, Lucene
> could have accessed the vectors in the index using some sort of
> filter. So, for example, if it knows to access "Foobar" by its name
> in the index, why can't it take "Foo*" and just get all the vectors
> until "Fop" is met (for example). Why does it have to get
> deterministic list of terms?
>
> I will take a look at the Scorer - can you describe in short what
> exactly it does and where and when it is being called?
>
> I don't get John's comment though - Query::rewrite is being called
> prior to the actual searching (through QueryParser), how come it can
> use "information gathered from IndexReader at search time"?
>
> Itamar.
>
> -----Original Message-----
> From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
> Sent: Tuesday, April 08, 2008 12:57 AM
> To: java-user@lucene.apache.org
> Subject: Re: Why Lucene has to rewrite queries prior to actual
> searching?
>
> Itamar,
>
> Query rewrite replaces wildcards with terms available from the index.
> Usually that involves replacing a wildcard with a BooleanQuery that
> is an effective OR over the available terms while using a flat
> coordination factor, i.e. it does not matter how many of the
> available terms actually match a document, as long as at least one
> matches.
>
> For the required query parts (AND like), Scorer.skipTo() is used, and
> that could well be the filter mechanism you are referring to; have a
> look at the javadocs of Scorer, and, if necessary, at the actual code
> of ConjunctionScorer.
>
> Regards,
> Paul Elschot
>
> Op Monday 07 April 2008 23:13:09 schreef Itamar Syn-Hershko:
> > Hi all,
> >
> > Can someone from the experts here explain why Lucene has to get a
> > "rewritten" query for the Searcher - so Phrase or Wildcards queries
> > have to rewrite themselves into a "primitive" query, that is then
> > passed to Lucene to look for? I'm probably not familiar too much
> > with the internals of Lucene, but I'd imagine that if you can
> > inflate a query using wildcards via xxxxQuery sub classing, you
> > could as easily (?) have some sort of Filter mechanism during the
> > search, so that Lucene retrieves the Position vectors for all the
> > terms that pass that filter, instead of retrieving only the
> > position data for deterministic terms (with no wildcards etc.). If
> > that was possible to do somehow, it could greatly increase the
> > searchability of Lucene indices by using RegEx (without re-writing
> > and getting the dreaded MaxClauseCount error) and similar.
> >
> > Would love to hear some insights on this one.
> >
> > Itamar.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message