Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 89042 invoked from network); 17 Apr 2009 22:03:47 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Apr 2009 22:03:47 -0000 Received: (qmail 31092 invoked by uid 500); 17 Apr 2009 22:03:46 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 30997 invoked by uid 500); 17 Apr 2009 22:03:46 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 30981 invoked by uid 99); 17 Apr 2009 22:03:46 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Apr 2009 22:03:46 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [160.79.251.45] (HELO mail02.tveyes.com) (160.79.251.45) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 17 Apr 2009 22:03:37 +0000 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Subject: RE: PrefixQuery.rewrite Date: Fri, 17 Apr 2009 18:03:15 -0400 Message-ID: <3CA90CC651AE3F4BAEDF8A5B78639C8C02BC26D4@mail02.tveyes.com> In-Reply-To: <0BE09006E68E4CDFBD73F7FB1410EC6D@VEGA> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: PrefixQuery.rewrite thread-index: Acm/kU/wzjm1RHxoQrCZnJSobCqmSgADeRyQAAIi2YA= References: <3CA90CC651AE3F4BAEDF8A5B78639C8C02BC26AE@mail02.tveyes.com> <0BE09006E68E4CDFBD73F7FB1410EC6D@VEGA> From: "David Seltzer" To: X-Virus-Checked: Checked by ClamAV on apache.org Thanks for the explanation! I was mistaken in my understanding of the sort order of TermEnum. -Dave -----Original Message----- From: Uwe Schindler [mailto:uwe@thetaphi.de]=20 Sent: Friday, April 17, 2009 5:08 PM To: java-dev@lucene.apache.org Subject: RE: PrefixQuery.rewrite Hi Dave, The code is correct, here my comments: > This > code, as I understand it, is designed to expand a prefix wildcard and > rewrite the query as a long boolean series of ANDs. >=20 > To improve performance the code has a Break statement designed to kick > out of the TermEnum starts enumerating on another field. >=20 > //FROM /src/java/org/apache/lucene/search/PrefixQuery.java > public Query rewrite(IndexReader reader) throws IOException { > BooleanQuery query =3D new BooleanQuery(true); Here a new TermEnum is created, which starts at the term prefix=3Dnew Term(field,prefixText). The TermEnum is ordered by (field,termtext). Reader.terms(term) retrieves a TermEnum that is positioned exactly at the given term or, if that not exists, at the next one following the requested term (in the above described order): > TermEnum enumerator =3D reader.terms(prefix); > try { > String prefixText =3D prefix.text(); > String prefixField =3D prefix.field(); > do { > Term term =3D enumerator.term(); This check does exactly what you think, it is the exit condition: If the term is from another field, exit If the term is null, the enumeration is exhausted, exit If the term does not start with the prefix, also exit. This condition is enough. If the initial positioning of the enum was exactly on a term with the prefix (the prefix term itself), it is really the first, and no term was forgotten. If the initial term was not exactly the same but bigger, it can be two different cases: a) it starts with the prefix -> iterate further b) it does not start with the prefix, there were never be a term with that prefix. > if (term !=3D null && > term.text().startsWith(prefixText) && > term.field() =3D=3D prefixField) // interned comparison > { > TermQuery tq =3D new TermQuery(term); // found a match > tq.setBoost(getBoost()); // set the boost > query.add(tq, BooleanClause.Occur.SHOULD); // add > to query > //System.out.println("added " + term); > } else { > break; > } > } while (enumerator.next()); > } finally { > enumerator.close(); > } > return query; > } >=20 > I think that there may be a logic problem here - - - to me it seems that > if I performed a prefix query on a Field that wasn't first in line > during the the TermEnum's output that my prefix would never be expanded. > I may be misunderstanding the ordering that IndexReader.terms(Term) > produces. Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org