Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 43217 invoked from network); 6 Nov 2004 12:24:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 6 Nov 2004 12:24:07 -0000 Received: (qmail 342 invoked by uid 500); 6 Nov 2004 12:23:44 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 218 invoked by uid 500); 6 Nov 2004 12:23:42 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 164 invoked by uid 99); 6 Nov 2004 12:23:42 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of mark@weballistics.com designates 82.68.192.116 as permitted sender) Received: from [82.68.192.116] (HELO mail.weballistics.co.uk) (82.68.192.116) by apache.org (qpsmtpd/0.28) with SMTP; Sat, 06 Nov 2004 04:23:41 -0800 Received: (qmail 17238 invoked from network); 6 Nov 2004 12:22:57 -0000 Received: from dev.weballistics.co.uk (HELO ?192.168.4.37?) (192.168.4.37) by mail.weballistics.co.uk with SMTP; 6 Nov 2004 12:22:57 -0000 Subject: query boosting using a word list. From: Mark Page To: lucene-user@jakarta.apache.org Content-Type: text/plain Organization: WEBallistics Message-Id: <1099743773.6724.42.camel@dev.weballistics.co.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 (1.4.5-7) Date: Sat, 06 Nov 2004 12:22:54 +0000 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi, I have a database table of text flattend out and indexed. Although searching with fuzzy query works well in most instances, on occasions however the target record appears way down the list of matching records. This is because the query text may contain lots of irrelevant terms (in the context of the app) because the data is pulled from another source. To solve this I need to create a word list, so that the terms that are important to the app are boosted in the search. as an example... word list contains car manufacturers and models:- ... volkswagon golf polo ... query text = "gleaming white 2-door volkswagon golf" search = "gleaming white 2 door volkswagon^9 golf^9" I can use regexes to massage the raw query text, but was wondering if there is a more elegant solution available within the Lucene API. As a Lucene newbie any pointers or suggestion to solve what must be quite a common scenario appreciated. Regards, -Mark. --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org