lucenenet-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Artem Chereisky <a.cherei...@gmail.com>
Subject Re: Normalized scoring of queries against short documents?
Date Tue, 25 May 2010 22:31:52 GMT
Christian,
Have a look at CustomScoreQuery class.
Art


-a

On 26/05/2010, at 7:34, Christian Klauß <s8788740@inf.tu-dresden.de>  
wrote:

> Thank you for your reply,
> given someone has  knowledge of the algorithms to achieve normalized  
> scores are there any hooks in Lucene to implement a custom scoring  
> behaviour and if yes how complicated (as in how many classes would  
> need to be derived from and how much understanding of Lucene is  
> required) would it be to do that?
>
> Thanks
> Christian
>
> Digy schrieb:
>> It is an expected behaviour. Score can be any number .GT. 0 and isn't
>> normalized(although it mostly falls into [0-1] range).
>>
>> DIGY
>>
>> -----Original Message-----
>> From: Christian Klauß [mailto:s8788740@inf.tu-dresden.de] Sent: Tu 
>> esday, May 25, 2010 2:42 PM
>> To: lucene-net-user@lucene.apache.org
>> Subject: Normalized scoring of queries against short documents?
>>
>> Hi,
>> I am working on a small distributed document indexing system using Lucene.net 
>> . A fellow student of mine wants to use the system for a paper  but  
>> relies on the scores beeing normalized between [0,1]. As far as I  
>> understand query/document scores are calculated on basis of the TF- 
>> IDF algorithm and are normalized to [0,1]. For most documents and  
>> queries I found this to be true but in some cases Lucene seems to  
>> produces scores greater than 1. After doing some tests this only  
>> seems to apply to queries (we are talking about simple word queries  
>> here) for very short documents (not more than several words). Is  
>> this the expected, correct behaviour? What are the constraints  
>> regarding the value-range of scores (as in - can it be guaranteed  
>> that queries against documents with more than X words produce  
>> results between [0,1])?
>>
>> I am using Lucene.Net 2.9.2 with the StandardAnalyzer and mixed- 
>> language PDF documents (extracted through IKVM-based pdfbox).
>>
>> Thank you for your help
>> Christian
>>
>

Mime
View raw message