lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Birtwell <David_Birtw...@vwr.com>
Subject Re: Question: using boost for sorting
Date Tue, 15 Oct 2002 14:56:19 GMT
Hi Dmitry,

I was faced with a similar problem.  We wanted to have a numeric rank 
field in each document influence the order in which the documents were 
returned by lucene.  While investigating a solution for this, I wanted 
to see if I could implement strict sorting based on this numeric value. 
 I was able to accomplish this using document boosting, but not without 
modifying the lucene source.  Our "ranking" field is an integer value 
from one to one hundred.  I'm not sure if this will help you, but I'll 
include a summary of what I did.

In DocumentWriter remove the normalization by field length:
    float norm = fieldBoosts[n] * 
Similarity.normalizeLength(fieldLengths[n]);
to
    float norm = fieldBoosts[n];

In TermScorer and PhraseScorer, modify the score() method to ignore the 
lucene base score:
    score *= Similarity.decodeNorm(norms[d]);
to
    score = Similarity.decodeNorm(norms[d]);

In Similarity.java, make byteToFloat() public.

At index time, use Similarity.byteToFloat() to determine your boost 
value as in the following pseudocode:
    Document d = new Document();
    ... add your fields ...
    int rank = d.getField("RANK"); (range of rank can be 0 to 255)
    float sortVal = Similarity.byteToFloat(rank)
    d.setBoost(sortVal)

If you'd like the reasoning behind any or all of these items, let me know.

DaveB



Dmitry Serebrennikov wrote:

> Greetings Everyone,
>
> I'm thinking of trying to build something that manipulates a query 
> score in order to achieve a sort order other then the default 
> relevance sort. The idea is to create a new type of query:
> SortingQuery( Query query, String sortByField )
>
> It would run the sub-query and return results in an order of the 
> values found in the "sortByField" for those documents. Now, I've 
> looked at all of the sorting discussion prior to this, and the best 
> approach (recommended by Doug among others) is to provide some sort of 
> a fast access to the field values inside the HitCollector. Reading 
> documents at search time is too slow, so people access the data 
> elsewhere or build an in-memory index of that data (such as is done in 
> the SearchBean's SortField).
>
> My idea is different. I want to try to do the following:
> - compose a query that consists of the original sub-query followed by 
> a special "sorting query"
> - "boost" the score of the original sub-query to 0
> - compute the score of the sorting query such that it would reflect 
> the desired sort order
>
> Has anyone tried to do something like this?
> Would this work?
> Is this worth doing?
> If it would, would then I have to do something during the indexing 
> time to set normalization / scoring factors for that field to 
> something or other?
>
> Thanks.
> Dmitry.
>
>
>
> -- 
> To unsubscribe, e-mail:   
> <mailto:lucene-user-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail: 
> <mailto:lucene-user-help@jakarta.apache.org>
>
>



--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message