Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-dev@lucene.apache.org
Received-SPF: pass (nike.apache.org: local policy)
Content-class: urn:content-classes:message
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Subject: RE: PrefixQuery.rewrite
Date: Fri, 17 Apr 2009 18:03:15 -0400
Message-ID: <3CA90CC651AE3F4BAEDF8A5B78639C8C02BC26D4@mail02.tveyes.com>
In-Reply-To: <0BE09006E68E4CDFBD73F7FB1410EC6D@VEGA>
Thread-Topic: PrefixQuery.rewrite
thread-index: Acm/kU/wzjm1RHxoQrCZnJSobCqmSgADeRyQAAIi2YA=
References: <3CA90CC651AE3F4BAEDF8A5B78639C8C02BC26AE@mail02.tveyes.com>
 <0BE09006E68E4CDFBD73F7FB1410EC6D@VEGA>
From: "David Seltzer" <dseltzer@TVEyes.com>
To: <java-dev@lucene.apache.org>

Thanks for the explanation! I was mistaken in my understanding of the
sort order of TermEnum.

-Dave

-----Original Message-----
From: Uwe Schindler [mailto:uwe@thetaphi.de]=20
Sent: Friday, April 17, 2009 5:08 PM
To: java-dev@lucene.apache.org
Subject: RE: PrefixQuery.rewrite

Hi Dave,

The code is correct, here my comments:

> This
> code, as I understand it, is designed to expand a prefix wildcard and
> rewrite the query as a long boolean series of ANDs.
>=20
> To improve performance the code has a Break statement designed to kick
> out of the TermEnum starts enumerating on another field.
>=20
>   //FROM /src/java/org/apache/lucene/search/PrefixQuery.java
>   public Query rewrite(IndexReader reader) throws IOException {
>     BooleanQuery query =3D new BooleanQuery(true);

Here a new TermEnum is created, which starts at the term prefix=3Dnew
Term(field,prefixText). The TermEnum is ordered by (field,termtext).
Reader.terms(term) retrieves a TermEnum that is positioned exactly at
the
given term or, if that not exists, at the next one following the
requested
term (in the above described order):

>     TermEnum enumerator =3D reader.terms(prefix);
>     try {
>       String prefixText =3D prefix.text();
>       String prefixField =3D prefix.field();
>       do {
>         Term term =3D enumerator.term();

This check does exactly what you think, it is the exit condition:
If the term is from another field, exit
If the term is null, the enumeration is exhausted, exit
If the term does not start with the prefix, also exit. This condition is
enough. If the initial positioning of the enum was exactly on a term
with
the prefix (the prefix term itself), it is really the first, and no term
was
forgotten. If the initial term was not exactly the same but bigger, it
can
be two different cases:
a) it starts with the prefix -> iterate further
b) it does not start with the prefix, there were never be a term with
that
prefix.

>         if (term !=3D null &&
>             term.text().startsWith(prefixText) &&
>             term.field() =3D=3D prefixField) // interned comparison
>         {
>           TermQuery tq =3D new TermQuery(term);	  // found a match
>           tq.setBoost(getBoost());                // set the boost
>           query.add(tq, BooleanClause.Occur.SHOULD);		  // add
> to query
>           //System.out.println("added " + term);
>         } else {
>           break;
>         }
>       } while (enumerator.next());
>     } finally {
>       enumerator.close();
>     }
>     return query;
>   }
>=20
> I think that there may be a logic problem here - - - to me it seems
that
> if I performed a prefix query on a Field that wasn't first in line
> during the the TermEnum's output that my prefix would never be
expanded.
> I may be misunderstanding the ordering that IndexReader.terms(Term)
> produces.


Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org