Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 41737 invoked from network); 7 Apr 2008 22:56:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 7 Apr 2008 22:56:15 -0000 Received: (qmail 22172 invoked by uid 500); 7 Apr 2008 22:56:06 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 22144 invoked by uid 500); 7 Apr 2008 22:56:06 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 22133 invoked by uid 99); 7 Apr 2008 22:56:06 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2008 15:56:06 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [194.109.24.22] (HELO smtp-vbr2.xs4all.nl) (194.109.24.22) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 07 Apr 2008 22:55:24 +0000 Received: from k8u.lan (porta.xs4all.nl [80.127.24.69]) by smtp-vbr2.xs4all.nl (8.13.8/8.13.8) with ESMTP id m37MtZJQ075178 for ; Tue, 8 Apr 2008 00:55:35 +0200 (CEST) (envelope-from paul.elschot@xs4all.nl) From: Paul Elschot To: java-user@lucene.apache.org Subject: Re: Why Lucene has to rewrite queries prior to actual searching? Date: Tue, 8 Apr 2008 00:55:40 +0200 User-Agent: KMail/1.9.6 (enterprise 0.20070907.709405) References: <20080407223145.B66EDB08413@out4.bezeqint.net> In-Reply-To: <20080407223145.B66EDB08413@out4.bezeqint.net> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200804080055.40648.paul.elschot@xs4all.nl> X-Virus-Scanned: by XS4ALL Virus Scanner X-Virus-Checked: Checked by ClamAV on apache.org Itamar, Have a look here: http://lucene.apache.org/java/2_3_1/scoring.html Regards, Paul Elschot Op Tuesday 08 April 2008 00:34:48 schreef Itamar Syn-Hershko: > Paul and John, > > Thanks for your quick reply. > > The problem with query rewriting is the beforementioned > MaxClauseException. Instead of inflating the query and passing a > deterministic list of terms to the actual search routine, Lucene > could have accessed the vectors in the index using some sort of > filter. So, for example, if it knows to access "Foobar" by its name > in the index, why can't it take "Foo*" and just get all the vectors > until "Fop" is met (for example). Why does it have to get > deterministic list of terms? > > I will take a look at the Scorer - can you describe in short what > exactly it does and where and when it is being called? > > I don't get John's comment though - Query::rewrite is being called > prior to the actual searching (through QueryParser), how come it can > use "information gathered from IndexReader at search time"? > > Itamar. > > -----Original Message----- > From: Paul Elschot [mailto:paul.elschot@xs4all.nl] > Sent: Tuesday, April 08, 2008 12:57 AM > To: java-user@lucene.apache.org > Subject: Re: Why Lucene has to rewrite queries prior to actual > searching? > > Itamar, > > Query rewrite replaces wildcards with terms available from the index. > Usually that involves replacing a wildcard with a BooleanQuery that > is an effective OR over the available terms while using a flat > coordination factor, i.e. it does not matter how many of the > available terms actually match a document, as long as at least one > matches. > > For the required query parts (AND like), Scorer.skipTo() is used, and > that could well be the filter mechanism you are referring to; have a > look at the javadocs of Scorer, and, if necessary, at the actual code > of ConjunctionScorer. > > Regards, > Paul Elschot > > Op Monday 07 April 2008 23:13:09 schreef Itamar Syn-Hershko: > > Hi all, > > > > Can someone from the experts here explain why Lucene has to get a > > "rewritten" query for the Searcher - so Phrase or Wildcards queries > > have to rewrite themselves into a "primitive" query, that is then > > passed to Lucene to look for? I'm probably not familiar too much > > with the internals of Lucene, but I'd imagine that if you can > > inflate a query using wildcards via xxxxQuery sub classing, you > > could as easily (?) have some sort of Filter mechanism during the > > search, so that Lucene retrieves the Position vectors for all the > > terms that pass that filter, instead of retrieving only the > > position data for deterministic terms (with no wildcards etc.). If > > that was possible to do somehow, it could greatly increase the > > searchability of Lucene indices by using RegEx (without re-writing > > and getting the dreaded MaxClauseCount error) and similar. > > > > Would love to hear some insights on this one. > > > > Itamar. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org