lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhang, Lisheng" <Lisheng.Zh...@BroadVision.com>
Subject RE: Score per position
Date Tue, 13 Dec 2011 08:11:56 GMT
Hi,

1) I did not read IndexDocValues code, but it looks like it may not fully meet your needs,
other experts
   may comment. 

2) No, that's not what I meant, I meant during index time, we can create a field to contain
as much as 
   useful information as possible for later scoring, then in CustomScoreQuery we can get this
field by
   FieldCache and use it.

Best regards, Lisheng

-----Original Message-----
From: arnon ma [mailto:arnon.ma@yahoo.com]
Sent: Sunday, December 11, 2011 3:44 AM
To: java-user@lucene.apache.org
Subject: Re: Score per position


Lisheng, thanks for the response.
 
If I understand correctly, both IndexDocValues and FieldCache are based on a single value
per Document per Field, that can be taken into account for scoring. Is that correct ?
We need a value per Document per Field *per Term*, like term frequency. Can this be represented as
well by IndexDocValues ?
 
Regarding phrase queries, assume that the phrase "good morning" appears twice in a document,
in positions 8-9 and 15-16. Then the phrase frequency is 2. But what we rather need here
is the payload attached to each of the positions, e.g. 0.5,0.5 for 8-9 and 0.8,0.7 for 15-16;
so the total score is 0.5*0.5+0.8*0.7. So inside CustomScoreQuery essentially we need to fetch
the payloads of good and morning separately (maybe using TermPositions?), and use them to
score the document. Is this what you meant ?
 
Thanks,
Arnon.

From: "Zhang, Lisheng" <Lisheng.Zhang@BroadVision.com>
To: java-user@lucene.apache.org; arnon ma <arnon.ma@yahoo.com> 
Sent: Thursday, December 8, 2011 7:30 PM
Subject: RE: Score per position

Hi,

A few days ago I asked a similar question:

1) in coming lucene 4.0, there is a feature sort like payload in document level:

>lucene 4 has a feature called IndexDocValues which is essentially a
> payload per document per field.
>
> you can read about it here:
> http://www.searchworkings.org/blog/-/blogs/introducing-lucene-index-doc-values
> http://www.searchworkings.org/blog/-/blogs/apache-lucene-flexiblescoring-with-indexdocvalues
> http://www.searchworkings.org/blog/-/blogs/indexdocvalues-their-applications

2) may consider using FieldCache along with CustomScoreQuery (my case is timestamp 
filed, but we can put whatever logic into customized field, during indexing time).

>>> you can simply index your timestamp (untokenzied) and wrap your query
>>> in a CustomScoreQuery. This query accepts your user query and a
>>> ValueSource. During search CustomScoreQuery calls your valuesource for
>>> each document that the user query scores and multiplies the result of
>>> the ValueSource into the score. Inside your valuesource you can simply
>>> get the timestamps from the FieldCache and calculate your custom
>>> boost...

Best regards, Lisheng


-----Original Message-----
From: arnon ma [mailto:arnon.ma@yahoo.com]
Sent: Wednesday, December 07, 2011 4:26 AM
To: java-user@lucene.apache.org
Subject: Score per position


We have an application where every term position in a document is associated with an "engine
score".
A term query should then be scored according to the sum of "engine scores" of the term in
a document, rather than on the term frequency.
For example, term frequency of 5 with an average engine score of 100 should be equivalent
to term frequency of 1 with engine score 500.
 
I understood that if I keep the engine score per position in the payload, I will be able to
use scorePayload in combination of a summary version of PayloadFunction to get the sum of
engine scores of a term in a document, and so will be able to achieve my goal.
 
There are two issues with this solution:
1. Even the simplest term query should scan the positions file in order to get the payloads,
which could be a performance issue.
We would prefer to index the sum of engine scores in advance per document, in addition to
the term frequency. This is some sort of payload in the document level. Does Lucene support
that or have any other solution for this issue ?
 
2. The "engine score" of a phrase occurrence is defined as the multiplication of engine scores
of the terms that compose the phrase.
So in scorePayload I need the payloads of all the terms in the phrase in order to be able
to appropriately score the phrase occurrence.
As much as I understand, the current interface of scorePayload does not provide this information.
Is there another way this can be achieved in Lucene ?
 
Thanks in advance,
Arnon.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message