Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
From: Ype Kingma <ykingma@xs4all.nl>
To: lucene-user@jakarta.apache.org
Subject: Re: Exact match detection
Date: Sun, 9 May 2004 14:03:34 +0200
User-Agent: KMail/1.5.4
References: <409DCA92.4070402@byzantine.no>
In-Reply-To: <409DCA92.4070402@byzantine.no>
MIME-Version: 1.0
Content-Type: text/plain;
  charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
Message-Id: <200405091403.34875.ykingma@xs4all.nl>

On Sunday 09 May 2004 08:07, Alexander Staubo wrote:
> I need to detect exact matches. For example, if the query is "foo
> bar", a document matching both terms "foo" and "bar" is considered an
> exact match; and everything else is considered an inexact match.
>
> Obviously a union with "+foo +bar" would work, but for performance
> reasons I need to avoid multiple queries.

The scoring layer is thin, and the underlying exact match implementation
is pretty efficient. In case you are not interested in the
score, you can use the lower level searching API, and use your
own document collector that ignores the score:

See the search() method here:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Searcher.html
and the HitCollector:
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html

Ignoring the score is a only slightly wasteful, because the scoring layer is much
faster than the disk i/o's needed for search, so rolling your own search
method is normally not worthwhile for speed.

You can construct the Query either using query the parser, or do it yourself,
in this case as a BooleanQuery containing two required TermQuery's,
one for "foo" and one for "bar".

For best performance, don't use the collected document nrs until after the search.

Evt. also check out the Lucene wiki for some articles with examples
on how to use the Lucene API.

Good luck,
Ype


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org