Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 99374 invoked from network); 15 Jul 2008 15:23:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 15 Jul 2008 15:23:02 -0000 Received: (qmail 63127 invoked by uid 500); 15 Jul 2008 15:22:53 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 63087 invoked by uid 500); 15 Jul 2008 15:22:53 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 63053 invoked by uid 99); 15 Jul 2008 15:22:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jul 2008 08:22:53 -0700 X-ASF-Spam-Status: No, hits=-4.0 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of absayeed@us.ibm.com designates 32.97.182.141 as permitted sender) Received: from [32.97.182.141] (HELO e1.ny.us.ibm.com) (32.97.182.141) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Jul 2008 15:21:58 +0000 Received: from d01relay02.pok.ibm.com (d01relay02.pok.ibm.com [9.56.227.234]) by e1.ny.us.ibm.com (8.13.8/8.13.8) with ESMTP id m6FFMLKk022578 for ; Tue, 15 Jul 2008 11:22:21 -0400 Received: from d01av01.pok.ibm.com (d01av01.pok.ibm.com [9.56.224.215]) by d01relay02.pok.ibm.com (8.13.8/8.13.8/NCO v9.0) with ESMTP id m6FFMLj8138714 for ; Tue, 15 Jul 2008 11:22:21 -0400 Received: from d01av01.pok.ibm.com (loopback [127.0.0.1]) by d01av01.pok.ibm.com (8.12.11.20060308/8.13.3) with ESMTP id m6FFMLfs008348 for ; Tue, 15 Jul 2008 11:22:21 -0400 Received: from d01ml604.pok.ibm.com (d01ml604.pok.ibm.com [9.56.227.90]) by d01av01.pok.ibm.com (8.12.11.20060308/8.12.11) with ESMTP id m6FFMKJ5008332 for ; Tue, 15 Jul 2008 11:22:20 -0400 In-Reply-To: Subject: Re: Stable score scaling; LSI again To: java-user@lucene.apache.org X-Mailer: Lotus Notes Release 7.0 HF144 February 01, 2006 Message-ID: From: Asad Sayeed Date: Tue, 15 Jul 2008 11:22:21 -0400 X-MIMETrack: Serialize by Router on D01ML604/01/M/IBM(Build V85_M1_05262008|May 26, 2008) at 07/15/2008 11:22:20 MIME-Version: 1.0 Content-type: text/plain; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org In other words, for my first question, what I want to know is how I might consistently and correctly get the same max score for any two pairs of identical documents without having to rewrite major parts of lucene. I could find ALL the scores and divide them by the max, but that seems somehow wrong and not robust, especially since if I put the identical documents several times into the index, I get slightly different scores from a MoreLikeThis query. Yours, --Asad. Asad Sayeed/Watson/IBM @IBMUS To java-user@lucene.apache.org 07/14/2008 10:15 cc PM Subject Stable score scaling; LSI again Please respond to java-user@lucene. apache.org Hi, I have a couple of questions about how to alter the similarity scores. I need scores that can be thresholded, and whose thresholds remain stable even when I add documents to the IndexWriter. ie, identity should be a fixed value such as 1.0. I know that for efficiency reasons, Lucene doesn't do this. However, that level of efficiency is not as big a concern for me as getting a stable, thresholdable similarity score from, eg, "normal" cosine similarity. Is there a way to change the DefaultSimilarity trivally to get this feature, or is it a major overhaul? The searches from Lucene are being fed to another analyzer is why, so when the "identity" score changes by adding docs to the index, it messes up the rest of the processing. The other question I had was about scoring via Latent Semantic Indexing. I read in the archives of this list from way back when that LSI was hard to integrate into Lucene. Is that still the case? I mean, from what I understand, it is just transforming the index in some way. Yours, --Asad. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org