Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17756 invoked from network); 11 Apr 2009 00:14:39 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Apr 2009 00:14:39 -0000 Received: (qmail 50454 invoked by uid 500); 11 Apr 2009 00:14:36 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 50370 invoked by uid 500); 11 Apr 2009 00:14:36 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 50359 invoked by uid 99); 11 Apr 2009 00:14:36 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 00:14:36 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [171.67.219.89] (HELO smtp-roam.stanford.edu) (171.67.219.89) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 11 Apr 2009 00:14:27 +0000 Received: from smtp-roam.stanford.edu (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id 66DB677A01 for ; Fri, 10 Apr 2009 17:14:03 -0700 (PDT) Received: from [192.168.1.100] (adsl-76-202-117-24.dsl.pltn13.sbcglobal.net [76.202.117.24]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: bethard) by smtp-roam.stanford.edu (Postfix) with ESMTPSA id 18EC777989 for ; Fri, 10 Apr 2009 17:14:03 -0700 (PDT) Message-ID: <49DFE0B8.4080208@stanford.edu> Date: Fri, 10 Apr 2009 17:13:44 -0700 From: Steven Bethard User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: exponential boosts References: <49DFA47C.3010906@stanford.edu> In-Reply-To: <49DFA47C.3010906@stanford.edu> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On 4/10/2009 12:56 PM, Steven Bethard wrote: > I need to have a scoring model of the form: > > s1(d, q)^a1 * s2(d, q)^a2 * ... * sN(d, q)^aN > > where "d" is a document, "q" is a query, "sK" is a scoring function, and > "aK" is the exponential boost factor for that scoring function. As a > simple example, I might have: > > s1 = TF-IDF score matching "text" field (e.g. a TermQuery) > a1 = 1.0 > > s2 = TF-IDF score matching "author" field (e.g. a TermQuery) > a2 = 0.1 > > s3 = PageRank score (e.g. a FieldScoreQuery) > a3 = 0.5 > > It's important that the "aK" parameters are exponents in the scoring > function and not just multipliers because it allows me to do a > particular kind of optimized search for the best parameter values. > > How can I achieve this? My first thought was just that I should set the > boost factor for each query, but the boost factor is just a multiplier, > right? > > My second thought was to subclass CustomScoreQuery and override > customScore, but as far as I can tell, CustomScoreQuery can only combine > a Query with a ValueSourceQuery, while I need to combine a Query with > another Query (e.g. the example above with two TermQuery scores). My third thought was to create a wrapper class that takes a Query and an exponential boost factor. The wrapper class would delegate to the Query for all methods except .weight(). For .weight(), it would return a Weight wrapper that delegated to the Weight for all methods except .getValue(). For .getValue(), it would return the original value, raised to the appropriate exponent. But will that really work, or am I going to mess up the normalization or something else? Steve --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org