lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Problem with scoring:is it absolute or relative ?
Date Wed, 02 May 2007 16:53:43 GMT
Hi Otis

thanks, so the scores are relative to the result set.

Unfortunately I only have access to an xmlwebservice that let me send a 
lucene query so I don't have access to the various IndexSearcher 
methods. So it sounds like rather than just taking the top result and 
using that Ill have to take the first 10 results and work out my own 
score based on how many fields match and the importance of each field, 
unless anyone has any other ideas ?

Otis Gospodnetic wrote:
> Paul,
>
> That is because the Hits class is likely being used under the covers.  When you use the
IndexSearcher's search(...) method that returns Hits, hit scores are normalized, so they are
always between 0 and 1.  If you want the raw scores, and it sounds like you do, you could
use a lower-level search method, like the one that uses a HitCollector.
>
> Otis 
>
> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
>
> ----- Original Message ----
> From: Paul Taylor <paul_t100@fastmail.fm>
> To: java-user@lucene.apache.org
> Sent: Wednesday, May 2, 2007 10:30:12 AM
> Subject: Problem with scoring:is it absolute or relative ?
>
> Hi I am having problems understanding lucenes scoring. I am using the 
> Musicbrainz  which uses Lucene to provide searching facility over its 
> data, which put simply consists of a database about recording artists , 
> albums and song titles
>
> I can construct a query such as:
>     track:Minus AND artist:beck AND release:odelay
>
> to return Tracks called Minus by the Artist Beck for the release Odelay, 
> which works fine if all terms are correct however I am trying to 
> implement the search in a program which takes the terms  from values 
> embedded in existing audio files, so they may not be correct, so instead 
> of using an AND search I want to use a OR search such as
>     track:Minus  artist:beck  release:odelay
> to improve the chances of getting a match, and then using the scoring 
> returned work out whether the match is sufficiently simailr to accept it 
> as a match.
>
> Now the good news is that when multiple results are returned the record 
> with the top score is normally the best match BUT the bad news is that 
> if the match is very poor the top rated records normally return a score 
> of 100 even though
> only a few of the terms match.
>
>     track:Minus  artist:rubbish  release:odelay
>
> will return the Minus tracks by beck for the release odelay even through 
> the artist term is incorrect because this is the best match it can make 
> (2 out of 3 terms matched) but it is returning a score of 100 when I 
> would expect a score
> of 66 because only two of the terms match.
>
> Why is it doing this, and how can I create a query that only returns 100 
> if all terms match but will return possible matches where only some 
> terms match ? I have spoke to the creators of MusicBrainz they say they 
> are not using any kind of document/Field boosting and are that sure how 
> its work as theve only added lucene to the site quite recently.
>
> The above queries can be run tested from
>
> http://musicbrainz.org/search.html
>
> by:
>     entering the query in the 'Search For' box
>     selecting type:Track
>     selecting Use advanced query syntax 
> <http://musicbrainz.org/popup/TextSearchSyntax>
>
> and clicking on Indexed Search
>
> thanks Paul Taylor
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message