Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of erickerickson@gmail.com
 designates 209.85.128.190 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=B3FmsBrfbhxaGPL8aWvdFEZ9/hbbvbSgUrGd5SBPfa7Fs2PBFsn2J1ajMuCJm0AGkoJCc9jGqpe50rW+XimO34EfRzyjDT47RbiLvDfBJAevpngN+wTfdLs1PkDnJ3zRPJkVYQiibZqpWQ19pw1+zzx1fYsUMmZ1joOV/bJuYq8=
Message-ID: <359a92830805090732l48efe930j4a0a82b814780d8e@mail.gmail.com>
Date: Fri, 9 May 2008 10:32:49 -0400
From: "Erick Erickson" <erickerickson@gmail.com>
To: java-user@lucene.apache.org
Subject: Re: Using stored fields for scoring
In-Reply-To: <48245AB3.2000102@deri.org>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_3317_7277977.1210343569265"
References: <48245AB3.2000102@deri.org>

------=_Part_3317_7277977.1210343569265
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Well, all things are possible <G>.... But I don't think there's a way to get
the field from each document at scoring time efficiently. It looks like
you're already lazy-loading the field, which was going to be my suggestion.

You could get it much faster if you *did* index it (UN_TOKENIZED?) and
went after it with TermDocs/TermEnum.....

So what is the nature of the field you're using? Is it possible to build
up the list of doc<->binaryfield at, say, startup time and just use a
map or some such?

You could even think about putting all the binary data in your
 index in a special document that had a field(s) orthogonal to
all other document. Essentially take the map I suggested
earlier and stuff it in a doc with one field (say,
"MySpecialMapField"). Then read *that* document in
at startup (or even search time) to get your binary field for
scoring.

All this pre-supposes that your binary field/doc_id map will fit
in memory....

What about index-time boosting? This only does you good
if your binary data above is some sort of importance ranking.
Index time boosting says something like "This document title
is more important than normal" so this would *automatically*
affect your scoring. You'd have to apply the index-time boosts
selectively to the fields you want....

And if none of this is relevant, could you expand a bit more on what
you're trying to do? What is the nature and purpose of your
field you want to use to influence scoring?

Best
Erick

On Fri, May 9, 2008 at 10:07 AM, Paolo Capriotti <paolo.capriotti@deri.org>
wrote:

> Hi all,
> I am looking for a way to include a stored (non-indexed) field in the
> computation of scores for a query.
> I have tried using a ValueSourceQuery with a ValueSource subclass that
> simply retrieves the document and gets the field, like:
>
> public float floatVal(int doc) {
>  reader.document(doc, selector).getBinaryValue("myfield");
>  ....
> }
>
> but that's too slow, because it ends up doing a lookup for each matching
> document.
> Is it possible to use a stored field in a FunctionQuery or ValueSourceQuery
> in an efficient way (i.e. not dependent on the number of retrieved
> documents)?
> If the answer is yes, is it possible to update such a value in place
> without removing and reindexing the document?
>
> Thanks in advance.
>
> Paolo Capriotti
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

------=_Part_3317_7277977.1210343569265--