Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 99774 invoked from network); 6 Nov 2004 19:51:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 6 Nov 2004 19:51:56 -0000 Received: (qmail 31615 invoked by uid 500); 6 Nov 2004 19:51:48 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 31500 invoked by uid 500); 6 Nov 2004 19:51:41 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 31484 invoked by uid 99); 6 Nov 2004 19:51:41 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of mark@weballistics.com designates 82.68.192.116 as permitted sender) Received: from [82.68.192.116] (HELO mail.weballistics.co.uk) (82.68.192.116) by apache.org (qpsmtpd/0.28) with SMTP; Sat, 06 Nov 2004 11:51:39 -0800 Received: (qmail 17653 invoked from network); 6 Nov 2004 19:51:01 -0000 Received: from dev.weballistics.co.uk (HELO ?192.168.4.37?) (192.168.4.37) by mail.weballistics.co.uk with SMTP; 6 Nov 2004 19:51:01 -0000 Subject: Re: query boosting using a word list. From: Mark Page To: Lucene Users List In-Reply-To: <20041106175225.79711.qmail@web12701.mail.yahoo.com> References: <20041106175225.79711.qmail@web12701.mail.yahoo.com> Content-Type: text/plain Organization: WEBallistics Message-Id: <1099770657.6724.59.camel@dev.weballistics.co.uk> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 (1.4.5-7) Date: Sat, 06 Nov 2004 19:50:58 +0000 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Hi Otis, thanks for the reply. I'll have to read up the javadocs a bit more before I fully understand your answer. ;) One thing I had noticed in the docs (and was half expecting to get pointed towards) is the WordlistLoader class. I was thinking that I could maybe use this and create a kind of 'reverse' of the stop-word list (boosting instead of removing), or am I barking up the wrong tree? regards, -Mark. On Sat, 2004-11-06 at 17:52, Otis Gospodnetic wrote: > Hello Mark, > > It sounds like you could extend QueryParser and override one of the > Query get***Query methods (getFieldQuery?), perhaps first calling the > super method, and then adding a boost based on the words, which you > would look up in your implementation of the getFieldQuery method. > > Otis > > > --- Mark Page wrote: > > > Hi, > > > > I have a database table of text flattend out and indexed. > > > > Although searching with fuzzy query works well in most instances, on > > occasions however the target record appears way down the list of > > matching records. > > > > This is because the query text may contain lots of irrelevant terms > > (in > > the context of the app) because the data is pulled from another > > source. > > > > To solve this I need to create a word list, so that the terms that > > are > > important to the app are boosted in the search. as an example... > > > > word list contains car manufacturers and models:- > > ... > > volkswagon > > golf > > polo > > ... > > > > query text = "gleaming white 2-door volkswagon golf" > > > > search = "gleaming white 2 door volkswagon^9 golf^9" > > > > I can use regexes to massage the raw query text, but was wondering if > > there is a more elegant solution available within the Lucene API. > > > > As a Lucene newbie any pointers or suggestion to solve what must be > > quite a common scenario appreciated. > > > > Regards, -Mark. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org > -- Mark Page WEBallistics tel/fax: +44(0)20 7704 9885 --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org