Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23973 invoked from network); 20 Nov 2007 15:36:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 20 Nov 2007 15:36:02 -0000 Received: (qmail 32091 invoked by uid 500); 20 Nov 2007 15:35:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 31637 invoked by uid 500); 20 Nov 2007 15:35:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 31626 invoked by uid 99); 20 Nov 2007 15:35:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 07:35:43 -0800 X-ASF-Spam-Status: No, hits=-2.0 required=10.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of gresh@us.ibm.com designates 32.97.182.142 as permitted sender) Received: from [32.97.182.142] (HELO e2.ny.us.ibm.com) (32.97.182.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Nov 2007 15:35:31 +0000 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e2.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id lAKFZOSC015603 for ; Tue, 20 Nov 2007 10:35:24 -0500 Received: from d01av03.pok.ibm.com (d01av03.pok.ibm.com [9.56.224.217]) by d01relay04.pok.ibm.com (8.13.8/8.13.8/NCO v8.6) with ESMTP id lAKFZOij122854 for ; Tue, 20 Nov 2007 10:35:24 -0500 Received: from d01av03.pok.ibm.com (loopback [127.0.0.1]) by d01av03.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id lAKFZO5h016491 for ; Tue, 20 Nov 2007 10:35:24 -0500 Received: from d01ml605.pok.ibm.com (d01ml605.pok.ibm.com [9.56.227.91]) by d01av03.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id lAKFZOdJ016488 for ; Tue, 20 Nov 2007 10:35:24 -0500 To: java-user@lucene.apache.org MIME-Version: 1.0 Subject: MoreLikeThis and setBoost X-Mailer: Lotus Notes Release 7.0 HF277 June 21, 2006 From: Donna L Gresh Message-ID: Date: Tue, 20 Nov 2007 10:35:23 -0500 X-MIMETrack: Serialize by Router on D01ML605/01/M/IBM(Release 8.0|August 02, 2007) at 11/20/2007 10:35:23, Serialize complete at 11/20/2007 10:35:23 Content-Type: multipart/alternative; boundary="=_alternative 005599AF85257399_=" X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 005599AF85257399_= Content-Type: text/plain; charset="US-ASCII" I've been stepping through the contrib MoreLikeThis class and was wondering if people can give opinions on why you would or would not use setBoost(true) for the MoreLikeThis object. It seems a bit odd (at least to me) to boost the "good" terms in the query (based on the term's score), since won't the final score (once you use the query) in some sense "reflect" the effect of good terms already through the tf-idf? Is using boost in some way trying to "make up" for the fact that the returned query for the MLT object "loses" the term frequency of the terms in the reference document (that is, no matter how many times a term is in the reference document, the query remains the same, assuming that the term makes it into the query via the MLT heuristics). Thanks for any words of wisdom-- Donna --=_alternative 005599AF85257399_=--