lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: SweetSpotSimilarity
Date Tue, 06 Mar 2012 10:08:49 GMT
On 05/03/2012 19:26, Chris Hostetter wrote:
> : very small to occasionally very large.  It also might be the case that
> : cover letters and e-mails while short might not be really something to
> : heavily discount.  The lower discount range can be ignored by setting
> : the min of any sweet spot to 1.  Then one starts to wonder if there is
> : really is any level area.
> I would definitley not suggest using SSS for fields like legal brief text
> or emails where there is huge variability in the length of the content --
> i can't think of any context where a "short" email is definitively
> better/worse then a "long" email.  more traditional TF/IDF seems like it
> would make more sense there.
> : When I get that deep in the code the issue is not simply the shape of
> : the equation, but issues like how tweaking any parameters effects the
> : overall document scores.  For example, consider the comments about
> : "steepness" related to length norm.  It talks (some) mathematics of the
> : equation, but until one spends some time with that equation and
> : understanding where they all fit together, I doubt it jumps out at most
> : folks what large or smaller values mean for terms and resulting document
> : scores.
> :
> : One obvious hard to tease out part of the Similarity API is when each
> : part is called -- the simplest being index time vs. search time -- there
> well ... hopefully the Similarity docs and the the docs on Lucene scoring
> have filled in most of those blanks before you drill down into the
> specifics of how SSS work.  if not, then any concrete improvements you can
> suggest would certainly be apprecaited...
Chapter 12 Document Ranking in Hibernate Search in Action gives  a 
thorough explanation of Lucene Scoring and the Similarity class which 
Ive found helpful. I think its worth mentioning as not the most obvious 
book for this subject.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message