lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Problem with scoring:is it absolute or relative ?
Date Sat, 12 May 2007 11:23:01 GMT
I might be able to get a change in place, but Im struggling to 
understand what exactly needs to change. By default the IndexSearcher 
uses a TopDocCollecter, but it also uses a Scorer, Ive taken a look but 
Im struggling to find the code
that takes the score and normalises it, I just want to remove the 
normalization aspect so if the query was

track:Minus OR artist:beck OR release:odelay

it would return 3.0 when all three values matched, 2.0 if two matched 
and 1.0 if only one matched, I would expect it to take it into account 
query boosting, so that
if the query was

track:Minus^3 OR artist:beck OR release:odelay

it would return 5 if all three terms matched, 4 if the track and artist 
term matched ecetera.

thanks for any help paul

Paul Taylor wrote:
> Hi Otis
>
> thanks, so the scores are relative to the result set.
>
> Unfortunately I only have access to an xmlwebservice that let me send 
> a lucene query so I don't have access to the various IndexSearcher 
> methods. So it sounds like rather than just taking the top result and 
> using that Ill have to take the first 10 results and work out my own 
> score based on how many fields match and the importance of each field, 
> unless anyone has any other ideas ?
>
> Otis Gospodnetic wrote:
>> Paul,
>>
>> That is because the Hits class is likely being used under the 
>> covers.  When you use the IndexSearcher's search(...) method that 
>> returns Hits, hit scores are normalized, so they are always between 0 
>> and 1.  If you want the raw scores, and it sounds like you do, you 
>> could use a lower-level search method, like the one that uses a 
>> HitCollector.
>>
>> Otis
>> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
>> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>>
>> ----- Original Message ----
>> From: Paul Taylor <paul_t100@fastmail.fm>
>> To: java-user@lucene.apache.org
>> Sent: Wednesday, May 2, 2007 10:30:12 AM
>> Subject: Problem with scoring:is it absolute or relative ?
>>
>> Hi I am having problems understanding lucenes scoring. I am using the 
>> Musicbrainz  which uses Lucene to provide searching facility over its 
>> data, which put simply consists of a database about recording artists 
>> , albums and song titles
>>
>> I can construct a query such as:
>>     track:Minus AND artist:beck AND release:odelay
>>
>> to return Tracks called Minus by the Artist Beck for the release 
>> Odelay, which works fine if all terms are correct however I am trying 
>> to implement the search in a program which takes the terms  from 
>> values embedded in existing audio files, so they may not be correct, 
>> so instead of using an AND search I want to use a OR search such as
>>     track:Minus  artist:beck  release:odelay
>> to improve the chances of getting a match, and then using the scoring 
>> returned work out whether the match is sufficiently simailr to accept 
>> it as a match.
>>
>> Now the good news is that when multiple results are returned the 
>> record with the top score is normally the best match BUT the bad news 
>> is that if the match is very poor the top rated records normally 
>> return a score of 100 even though
>> only a few of the terms match.
>>
>>     track:Minus  artist:rubbish  release:odelay
>>
>> will return the Minus tracks by beck for the release odelay even 
>> through the artist term is incorrect because this is the best match 
>> it can make (2 out of 3 terms matched) but it is returning a score of 
>> 100 when I would expect a score
>> of 66 because only two of the terms match.
>>
>> Why is it doing this, and how can I create a query that only returns 
>> 100 if all terms match but will return possible matches where only 
>> some terms match ? I have spoke to the creators of MusicBrainz they 
>> say they are not using any kind of document/Field boosting and are 
>> that sure how its work as theve only added lucene to the site quite 
>> recently.
>>
>> The above queries can be run tested from
>>
>> http://musicbrainz.org/search.html
>>
>> by:
>>     entering the query in the 'Search For' box
>>     selecting type:Track
>>     selecting Use advanced query syntax 
>> <http://musicbrainz.org/popup/TextSearchSyntax>
>>
>> and clicking on Indexed Search
>>
>> thanks Paul Taylor
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>   
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message