Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 44879 invoked from network); 9 May 2004 12:03:44 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 9 May 2004 12:03:44 -0000 Received: (qmail 7932 invoked by uid 500); 9 May 2004 12:03:36 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 7835 invoked by uid 500); 9 May 2004 12:03:36 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 7797 invoked from network); 9 May 2004 12:03:35 -0000 Received: from unknown (HELO smtp-out1.xs4all.nl) (194.109.24.11) by daedalus.apache.org with SMTP; 9 May 2004 12:03:35 -0000 Received: from k7l.local (porta.xs4all.nl [80.127.24.69]) by smtp-out1.xs4all.nl (8.12.10/8.12.10) with ESMTP id i49C3Zdl086661 for ; Sun, 9 May 2004 14:03:35 +0200 (CEST) From: Ype Kingma To: lucene-user@jakarta.apache.org Subject: Re: Exact match detection Date: Sun, 9 May 2004 14:03:34 +0200 User-Agent: KMail/1.5.4 References: <409DCA92.4070402@byzantine.no> In-Reply-To: <409DCA92.4070402@byzantine.no> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200405091403.34875.ykingma@xs4all.nl> X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N On Sunday 09 May 2004 08:07, Alexander Staubo wrote: > I need to detect exact matches. For example, if the query is "foo > bar", a document matching both terms "foo" and "bar" is considered an > exact match; and everything else is considered an inexact match. > > Obviously a union with "+foo +bar" would work, but for performance > reasons I need to avoid multiple queries. The scoring layer is thin, and the underlying exact match implementation is pretty efficient. In case you are not interested in the score, you can use the lower level searching API, and use your own document collector that ignores the score: See the search() method here: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Searcher.html and the HitCollector: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/HitCollector.html Ignoring the score is a only slightly wasteful, because the scoring layer is much faster than the disk i/o's needed for search, so rolling your own search method is normally not worthwhile for speed. You can construct the Query either using query the parser, or do it yourself, in this case as a BooleanQuery containing two required TermQuery's, one for "foo" and one for "bar". For best performance, don't use the collected document nrs until after the search. Evt. also check out the Lucene wiki for some articles with examples on how to use the Lucene API. Good luck, Ype --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org