Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 84209 invoked from network); 13 Nov 2005 09:09:49 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 13 Nov 2005 09:09:49 -0000 Received: (qmail 67214 invoked by uid 500); 13 Nov 2005 09:09:44 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 67186 invoked by uid 500); 13 Nov 2005 09:09:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 67175 invoked by uid 99); 13 Nov 2005 09:09:43 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Nov 2005 01:09:43 -0800 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=FUZZY_AMBIEN X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [81.169.158.23] (HELO mail.aurisp.de) (81.169.158.23) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 13 Nov 2005 01:09:35 -0800 Received: from localhost (localhost [127.0.0.1]) by mail.aurisp.de (Postfix) with ESMTP id 913FEB814 for ; Sun, 13 Nov 2005 10:09:20 +0100 (CET) Received: from mail.aurisp.de ([127.0.0.1]) by localhost (mail.aurisp.de [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 27623-01-4 for ; Sun, 13 Nov 2005 10:09:15 +0100 (CET) Received: from localhost (201.80.202.62.cust.bluewin.ch [62.202.80.201]) by mail.aurisp.de (Postfix) with ESMTP id 22751B807 for ; Sun, 13 Nov 2005 10:09:14 +0100 (CET) Date: Sun, 13 Nov 2005 10:10:22 +0100 From: Sebastian Marius Kirsch To: java-user@lucene.apache.org Subject: Re: About Combining Scores Message-ID: <20051113091022.GS1399@amok.local> References: <17540.1131836681@www91.gmx.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <17540.1131836681@www91.gmx.net> User-Agent: Mutt/1.4i X-Virus-Scanned: amavisd-new at mail.aurisp.de X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N On Sun, Nov 13, 2005 at 12:04:41AM +0100, Karl Koch wrote: > My aim is to combine this two scores. The Lucenes score is normalisied > between 0.0 and 1.0 (if the score exceeded 1.0 at some point) or less then > 1.0 (if it did not). The user model looks the same in this perspective - > although based on different data - a 1.0 means the maximum of relevance and > a 0.0 a minimum or relevance. At the moment I am multiplying the Lucene > score with the score produced by the user model. This means the resulting, > combiend socre is number between 0.0 and 1.0 and represents the merged view > from both models - the IR view and the view of the user model. I came across that question too recently; it seems to be a rather under-researched topic in the literature. I used multiplication in the end, because it's simple, it produces reasonable results, it's not tunable, and it's invariant to normalization. (Don't make a model with tunable parameters if you don't know how to tune them ...) The most helpful paper I came across was this: http://trec.nist.gov/pubs/trec13/papers/microsoft-cambridge.web.hard.pdf It's about combining PageRank with a relevance score, but it contains a good description of how they arrived at their scoring formula. They use a linear combination of the two measures and transform them to have a roughly similar distribution. They then tuned the parameters using a test corpus (which may be difficult/impossible for your application.) Their system was one of the best at TREC-13. Regards, Sebastian -- Sebastian Kirsch [http://www.sebastian-kirsch.org/] NOTE: New email address! Please update your address book. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org