lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: Highlight - get terms used by lucene
Date Mon, 31 Mar 2008 22:55:15 GMT

: Solr returns the max score and the score per document.

: This means that the best hit always is 100% which is not always what you 
: want because the article itself could still be quite irrelevant...

Solr doesn't give you a percentage, and there's no reason to divide a 
doc's scroe by maxScore to get a percentage -- anymore then there would be 
with the Oracle function as described.  The Oracle docs don't say that you 
can divide a score of 23 by a max score of 100 to determine it's a 23% 
match, just that scores will always be less then 100 ... in fact the doc 
you linked to specificly says you can't compare scores, so a score of 23 
for one query doesn't mean the samething as a score for 23 from another 
query (which is also true for Lucene scores BTW, Lucene just doesn't 
promise you any particular max score because there are so many more 
internesting and complex query types in Lucene that make determining such 
a max impossible)

My main point was: rather then letting Solr score the results one way, and 
then trying to come up with your own variation on that score externally 
(which is error prone given that your scoring varaition might result in a 
differnet ordering and change which results appear per "page") let 
Solr compute the score for you. 

If you aren't happy with the way Solr computes the score, and you want a 
simpiler Score calculation likewhat Oracle provides (that will only work 
for simple Term queries) write a custom Similarity instance that does what 
you want...

http://lucene.apache.org/java/2_3_1/api/org/apache/lucene/search/Similarity.html
http://wiki.apache.org/solr/SolrPlugins

Off the cuff I think you'd get what Oracle describes by:
  - omiting norms on all fields in your schema.xml
  - making Similarity.queryNorm(float) allways return 3
  - making Similarity.tf(float) allways return it's input
  - not using query boosts

...all bets are off though if you use multi term queries (or phrase 
queries, or fuzzy queries, etc..) but you can play with the other methods 
in Similarity if you have a particular idea how you'd like those scored if 
they you do use them.


-Hoss


Mime
View raw message