lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Em <mailformailingli...@yahoo.de>
Subject Re: Scoring: Precedent for a Rules-/Priority-based Approach?
Date Tue, 08 Feb 2011 18:26:26 GMT

Hi Tavi,

could you please provide an example query for your problem and the
debugQuery's output?
It confuses me that you write "score(query
"apple") = max(score(field1:apple), score(field2:apple))"

I think your problem could come from the norms of your request, but I am not
sure.

If you can, show us some piece of your schema.xml and the debugQuery's
output, so that we can have a look at it.

I have to agree with Savvas: Tuning scoring for a special domain is an
exciting thing and there are lots of approaches out there to make scoring
good.

Regards


Tavi Nathanson wrote:
> 
> Hey everyone,
> 
> I have a question about Lucene/Solr scoring in general. There are many
> factors at play in the final score for each document, and very often one
> factor will completely dominate everything else when that may not be the
> intention.
> 
> ** The question: might there be a way to enforce strict requirements that
> certain factors are higher priority than other factors, and/or certain
> factors shouldn't overtake other factors? Perhaps a set of rules where one
> factor is considered before even examining another factor? Tuning boost
> numbers around and hoping for the best seems imprecise and very fragile.
> **
> 
> To make this more concrete, an example:
> 
> We previously added the scores of multi-field matches together via an OR,
> so: score(query "apple") = score(field1:apple) + score(field2:apple). I
> changed that to be more in-line with DisMaxParser, namely a max:
> score(query
> "apple") = max(score(field1:apple), score(field2:apple)). I also modified
> coord such that coord would only consider actual unique terms ("apple" vs.
> "orange"), rather than terms across multiple fields (field1:apple vs.
> field2:apple).
> 
> This seemed like a good idea, but it actually introduced a bug that was
> previously hidden. Suddenly, documents matching "apple" in the title and
> *nothing* in the body were being boosted over documents matching "apple"
> in
> the title and "apple" in the body! I investigated, and it was due to
> lengthNorm: previously, documents matching "apple" in both title and body
> were getting very high scores and completely overwhelming lengthNorm. Now
> that they were no longer getting *such* high scores, which was beneficial
> in
> many respects, they were also no longer overwhelming lengthNorm. This
> allowed lengthNorm to dominate everything else.
> 
> I'd love to hear your thoughts :)
> 
> Tavi
> 
> 

-- 
View this message in context: http://lucene.472066.n3.nabble.com/Scoring-Precedent-for-a-Rules-Priority-based-Approach-tp2452340p2453161.html
Sent from the Solr - User mailing list archive at Nabble.com.

Mime
View raw message