Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 60628 invoked from network); 9 May 2008 14:33:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 9 May 2008 14:33:29 -0000 Received: (qmail 86994 invoked by uid 500); 9 May 2008 14:33:22 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 86961 invoked by uid 500); 9 May 2008 14:33:22 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 86945 invoked by uid 99); 9 May 2008 14:33:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 May 2008 07:33:22 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com designates 209.85.128.190 as permitted sender) Received: from [209.85.128.190] (HELO fk-out-0910.google.com) (209.85.128.190) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 09 May 2008 14:32:36 +0000 Received: by fk-out-0910.google.com with SMTP id 18so1304400fkq.5 for ; Fri, 09 May 2008 07:32:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=FGCyNYm2ZJRZZG7yF6Svz+f+oLhwdvopDyMB2KL/L2E=; b=Z2zmQhmFd8NozmYAwyUA8hP/ckp7g/yfZwM1RX3yoGcrsyfMWMoeqHW+RHbOLlJRy+UJKrErxLoSx8NCDDs6WV8JezY50cCmjf0MzCJ76GnwEEGeEL5SkUzux1QUiPO4q8gaT1nWt/IAj+nLa+zAU+l6khC4t9j2TaTjaSElmjg= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=B3FmsBrfbhxaGPL8aWvdFEZ9/hbbvbSgUrGd5SBPfa7Fs2PBFsn2J1ajMuCJm0AGkoJCc9jGqpe50rW+XimO34EfRzyjDT47RbiLvDfBJAevpngN+wTfdLs1PkDnJ3zRPJkVYQiibZqpWQ19pw1+zzx1fYsUMmZ1joOV/bJuYq8= Received: by 10.82.105.1 with SMTP id d1mr589850buc.26.1210343569281; Fri, 09 May 2008 07:32:49 -0700 (PDT) Received: by 10.82.191.6 with HTTP; Fri, 9 May 2008 07:32:49 -0700 (PDT) Message-ID: <359a92830805090732l48efe930j4a0a82b814780d8e@mail.gmail.com> Date: Fri, 9 May 2008 10:32:49 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Using stored fields for scoring In-Reply-To: <48245AB3.2000102@deri.org> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_3317_7277977.1210343569265" References: <48245AB3.2000102@deri.org> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_3317_7277977.1210343569265 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Well, all things are possible .... But I don't think there's a way to get the field from each document at scoring time efficiently. It looks like you're already lazy-loading the field, which was going to be my suggestion. You could get it much faster if you *did* index it (UN_TOKENIZED?) and went after it with TermDocs/TermEnum..... So what is the nature of the field you're using? Is it possible to build up the list of doc<->binaryfield at, say, startup time and just use a map or some such? You could even think about putting all the binary data in your index in a special document that had a field(s) orthogonal to all other document. Essentially take the map I suggested earlier and stuff it in a doc with one field (say, "MySpecialMapField"). Then read *that* document in at startup (or even search time) to get your binary field for scoring. All this pre-supposes that your binary field/doc_id map will fit in memory.... What about index-time boosting? This only does you good if your binary data above is some sort of importance ranking. Index time boosting says something like "This document title is more important than normal" so this would *automatically* affect your scoring. You'd have to apply the index-time boosts selectively to the fields you want.... And if none of this is relevant, could you expand a bit more on what you're trying to do? What is the nature and purpose of your field you want to use to influence scoring? Best Erick On Fri, May 9, 2008 at 10:07 AM, Paolo Capriotti wrote: > Hi all, > I am looking for a way to include a stored (non-indexed) field in the > computation of scores for a query. > I have tried using a ValueSourceQuery with a ValueSource subclass that > simply retrieves the document and gets the field, like: > > public float floatVal(int doc) { > reader.document(doc, selector).getBinaryValue("myfield"); > .... > } > > but that's too slow, because it ends up doing a lookup for each matching > document. > Is it possible to use a stored field in a FunctionQuery or ValueSourceQuery > in an efficient way (i.e. not dependent on the number of retrieved > documents)? > If the answer is yes, is it possible to update such a value in place > without removing and reindexing the document? > > Thanks in advance. > > Paolo Capriotti > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_3317_7277977.1210343569265--