Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 63395 invoked from network); 9 Oct 2009 21:21:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 9 Oct 2009 21:21:21 -0000 Received: (qmail 80954 invoked by uid 500); 9 Oct 2009 21:21:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 80881 invoked by uid 500); 9 Oct 2009 21:21:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 80871 invoked by uid 99); 9 Oct 2009 21:21:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 21:21:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of scottblanc@gmail.com designates 209.85.212.191 as permitted sender) Received: from [209.85.212.191] (HELO mail-vw0-f191.google.com) (209.85.212.191) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 Oct 2009 21:21:08 +0000 Received: by vws29 with SMTP id 29so606194vws.20 for ; Fri, 09 Oct 2009 14:19:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:message-id:subject:from:to:content-type; bh=JdUrCmV0ktQLq02qYSYyKqalWnD7Q3EmR1WoyKiAc5w=; b=f3XuXEYMW6wUVsSDHTnToa+lBS6wiVOWiAeZ4bB0mRZystcX04I2vn8iuDtdUbVkcs YRnG+e/xgMRacmfWP1lw6z03IMerc6FLaEB4Okz06phrmvaGxALXLEWENB3YKs100dXM YQsrTFXo++QDjlPjR3G/p+o5PUs0ZfaEZ69F4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=A0Rx324zQHsR61//Z9eedeLO5XsLRbiYbaEdTzMfpJM292wh/31C2R1wqu/gMdPOnN TaWHYzojEnlHccaLtFOahWu5hduKULMHoOamB58fx7RMmjFANSi4Bx1QOLNtP7Po12kc KSp7PO8Q62RFh+bwEp8Iv0Aa7SQjAS/+unNQM= MIME-Version: 1.0 Received: by 10.220.114.12 with SMTP id c12mr4638106vcq.109.1255123186639; Fri, 09 Oct 2009 14:19:46 -0700 (PDT) In-Reply-To: References: <401b577e0910080754o52c72a7kdd4fdc205753e3f2@mail.gmail.com> <401b577e0910080756m2f36ae8bo7c573465376d2df6@mail.gmail.com> <85635AEF-DEBE-49CF-B04A-CC45CB4B7CF6@apache.org> Date: Fri, 9 Oct 2009 14:19:46 -0700 Message-ID: <401b577e0910091419pa43264bs8236bbda8daab4fa@mail.gmail.com> Subject: Re: Question about how to speed up custom scoring From: scott w To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=001485e8ad58b0990a047587234e X-Virus-Checked: Checked by ClamAV on apache.org --001485e8ad58b0990a047587234e Content-Type: text/plain; charset=ISO-8859-1 Right exactly. I looked into payload initially and realized it wouldn't work for my use case. On Fri, Oct 9, 2009 at 2:00 PM, Grant Ingersoll wrote: > Oops, just reread and realized you wanted query time weights. Payloads are > an index time thing. > > > On Oct 9, 2009, at 5:49 PM, Grant Ingersoll wrote: > > If you are trying to add specific term weights to terms in the index and >> then incorporate them into scoring, you might benefit from payloads and the >> PayloadTermQuery option. See >> http://www.lucidimagination.com/blog/2009/08/05/getting-started-with-payloads/ >> >> -Grant >> >> On Oct 8, 2009, at 11:56 AM, scott w wrote: >> >> Oops, forgot to include the class I mentioned. Here it is: >>> >>> public class QueryTermBoostingQuery extends CustomScoreQuery { >>> private Map queryTermWeights; >>> private float bias; >>> private IndexReader indexReader; >>> >>> public QueryTermBoostingQuery( Query q, Map termWeights, >>> IndexReader indexReader, float bias) { >>> super( q ); >>> this.indexReader = indexReader; >>> if (bias < 0 || bias > 1) { >>> throw new IllegalArgumentException( "Bias must be between 0 and 1" ); >>> } >>> this.bias = bias; >>> queryTermWeights = termWeights; >>> } >>> >>> @Override >>> public float customScore( int doc, float subQueryScore, float valSrcScore >>> ) { >>> Document document; >>> try { >>> document = indexReader.document( doc ); >>> } catch (IOException e) { >>> throw new SearchException( e ); >>> } >>> float termWeightedScore = 0; >>> >>> for (String field : queryTermWeights.keySet()) { >>> String docFieldValue = document.get( field ); >>> if (docFieldValue != null) { >>> Float weight = queryTermWeights.get( field ); >>> if (weight != null) { >>> termWeightedScore += weight * Float.parseFloat( docFieldValue ); >>> } >>> } >>> } >>> return bias * subQueryScore + (1 - bias) * termWeightedScore; >>> } >>> } >>> >>> On Thu, Oct 8, 2009 at 7:54 AM, scott w wrote: >>> >>> I am trying to come up with a performant query that will allow me to use >>>> a >>>> custom score where the custom score is a sum-product over a set of query >>>> time weights where each weight gets applied only if the query time term >>>> exists in the document . So for example if I have a doc with three >>>> fields: >>>> company=Microsoft, city=Redmond, and size=large, I may want to score >>>> that >>>> document according to the following function: city==Microsoft ? .3 : 0 * >>>> size ==large ? 0.5 : 0 to get a score of 0.8. Attached is a subclass I >>>> have >>>> tested that implements this with one extra component which is that it >>>> allow >>>> the relevance score to be combined in. >>>> >>>> The problem is this custom score is not performant at all. For example, >>>> on >>>> a small index of 5 million documents with 10 weights passed in it does >>>> 0.01 >>>> req/sec. >>>> >>>> Are there ways to make to compute the same custom score but in a much >>>> more >>>> performant way? >>>> >>>> thanks, >>>> Scott >>>> >>>> >> -------------------------- >> Grant Ingersoll >> http://www.lucidimagination.com/ >> >> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using >> Solr/Lucene: >> http://www.lucidimagination.com/search >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > -------------------------- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --001485e8ad58b0990a047587234e--