lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: The values which compute scores.
Date Wed, 30 May 2007 22:00:49 GMT
Hi Walt,

One question that comes to mind, is what are you looking to do?  Are  
you not happy with the current scoring or you just trying to better  
understand scoring?  The calls to Similarity.tf(), etc. are call  
backs from within the scoring algorithm (have a look at TermScorer in  
the code) and provide a means for an application to change the score,  
but in many cases there really isn't too much incentive to do so.

-Grant


On May 30, 2007, at 4:45 PM, Walt Stoneburner wrote:

> Hopefully I'm not opening myself up to public ridicule with what may
> be a very stupid question, but...
>
> At the moment, I'm trying to wrap my head around some of the math that
> happens when Lucene does scoring.  Let's put aside the big equation
> for a moment and focus on a simple method, such as tf().  [term
> frequency]
>
> I understand that tf(freq) is supposed to return larger values when
> freq is large, and smaller values when freq is small.  Though here's
> what making me scratch my head today:
>
> a) Where does freq come from?  (Not what is it, but who computes it  
> and how?)
>
> Reason I ask is:
>
> b) How do I know what "large" and "small" is, as I don't really have a
> relative scale of what the max and min values are?  Should I just
> assume a linear scale of 1.0 to 0.0 will be passed to the method?
>
> But then that begs the question...
>
> c) What values should I be passing out of a function like this?
> Should I normalize my outgoing scores to some scale, or do I simply
> just need to provide numbers that "have the right shaped curve".
>
> I wish the documentation shed a smidgen bit more light in those areas.
>
>
> I look at things like idf() which returns 1+log(ratio) and then has
> that value squared.  Clearly that isn't on a scale of 1.0 to 0.0.
>
> I feel like there may be some mathematical trickery going on and that
> maybe the actual score values themselves don't matter inside the
> ranking code, so long as their relative values to one another.
>
> This then makes me ponder how the normalization process is done
> between queries, allowing for a mix'n'match of results as these
> numbers spill to the outside world.  Obviously normalization has to
> happen at that point for the mixing query results magic to work.
>
>
> Is there a math wizard in the group who can talk to me like I'm  
> four years old?
>
> -wls
> http://www.wwco.com/~wls/blog/
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll
Center for Natural Language Processing
http://www.cnlp.org/tech/lucene.asp

Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ 
LuceneFAQ



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message