Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 90382 invoked from network); 11 Apr 2007 00:04:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 11 Apr 2007 00:04:01 -0000 Received: (qmail 74055 invoked by uid 500); 11 Apr 2007 00:04:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 74018 invoked by uid 500); 11 Apr 2007 00:04:00 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 74006 invoked by uid 99); 11 Apr 2007 00:04:00 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 17:04:00 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of deinspanjer@gmail.com designates 66.249.92.169 as permitted sender) Received: from [66.249.92.169] (HELO ug-out-1314.google.com) (66.249.92.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2007 17:03:53 -0700 Received: by ug-out-1314.google.com with SMTP id k40so4486ugc for ; Tue, 10 Apr 2007 17:03:32 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=nQaDwM6rfqUnV6Pg/PtpTVyvD5Y0V6dwfki3izUSonmsa+lw0POSZzThILlSovcpl0++KXH+KbIBP41MsV7Q4xxPBkagptQ2r2U221M1ozAsvV+mtxEf49t6Xw63UmEf1rpQlfopn0qxT3z6K0B4RaIU0Yo3y8MnZHwEMpzRAsM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=nFAfs3IdS7dCwN5tEToTEt8nQQJJ2l5SG69b7Wgbkj8QQk4Ej1ZZhf855YM0byJMe6TchqHBXZHSfuEV7aSE/HkbuC8ds/Vo4PNTo0Co6ruV4KCcxZIka64McZc6BPEthyAEjuJCLvWjVJ0rgOFyfyerJ5dIhq6TSANL8QaifS0= Received: by 10.82.102.4 with SMTP id z4mr6960bub.1176249812262; Tue, 10 Apr 2007 17:03:32 -0700 (PDT) Received: by 10.82.119.10 with HTTP; Tue, 10 Apr 2007 17:03:32 -0700 (PDT) Message-ID: Date: Tue, 10 Apr 2007 20:03:32 -0400 From: "Daniel Einspanjer" To: java-user@lucene.apache.org Subject: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure? In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline References: X-Virus-Checked: Checked by ClamAV on apache.org I asked this question on the Solr user list because that is the current lucene server implementation I'm using, but I didn't get any feedback there and the problem isn't really Solr specific so I thought I'd cross post here just in case any non-Solr users might have some ideas. Thank you very much for your time, Daniel ---------- Forwarded message ---------- From: Daniel Einspanjer Date: Apr 10, 2007 8:04 AM Subject: Ideas for a relevance score that could be considered stable across multiple searches with the same query structure? To: solr-user@lucene.apache.org I did a bit of research on the list for prior discussions of normalized scores and such. Please forgive me if I overlooked something relevant, but I didn't see anything exactly what I'm looking for. I am building a replacement for our current text matching engine that takes a list of documents from feed A and finds the best match for each of those in the list of documents from feed B. For purposes of this example, feed A and B might have the fields: title; director; year The people reviewing this matching process need some way of determining why a particular match was made other than the overall score. Was it because the title was a perfect match or was it because the title wasn't that close, but the director and year were dead on? The current idea I have for a strategy to provide this information would be to run my query four times (n + 1 where n is each scoring section), once to find the overall best match (a regular query) then each additional query grouping, requiring, and boosting a different section of the query. I would then store the rank of the "best" item returned by the overall query. That rank can be used to indicate the relevance of that item based on the defined criteria. So, following the indexes mentioned above, my queries would be: The natural "overall" query: (title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~)) director:"Director, Feed A." (year:1974^10 year:[1972 TO 1976]) The query for title relevance: +((title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~)))^100 director:"Director, Feed A." (year:1974^10 year:[1972 TO 1976]) The query for director relevance: +(director:"Director, Feed A.")^100 (title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~)) (year:1974^10 year:[1972 TO 1976]) The query for year relevance: +((year:1974^10 year:[1972 TO 1976]))^100 (title:"feed A item one title"^10 (+title:feed~ +title:A~ +title:item~ +title:one~ +title:title~)) director:"Director, Feed A." If the #1 item returned by the overall query was 1/10 for title, 3/10 for director, and 5/10 for year and those three scoring sections had equal weights of 1.0 to .10 then I would be able to display the following scores: title: 1.0 director: .8 year: .6 overall: 2.4 I looked at the javadocs related to the FunctionQuery class because it looked interesting, but the actual docs were a bit light and I wasn't able to determine if it might help me out with this need. Does this sound unreasonable to anyone? Is there a clearly better way I might have overlooked? Thank you very much for your ideas and comments, Daniel --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org