lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: SweetSpotSimilarity
Date Tue, 06 Mar 2012 10:08:49 GMT
On 05/03/2012 19:26, Chris Hostetter wrote:
> : very small to occasionally very large.  It also might be the case that
> : cover letters and e-mails while short might not be really something to
> : heavily discount.  The lower discount range can be ignored by setting
> : the min of any sweet spot to 1.  Then one starts to wonder if there is
> : really is any level area.
>
> I would definitley not suggest using SSS for fields like legal brief text
> or emails where there is huge variability in the length of the content --
> i can't think of any context where a "short" email is definitively
> better/worse then a "long" email.  more traditional TF/IDF seems like it
> would make more sense there.
>
> : When I get that deep in the code the issue is not simply the shape of
> : the equation, but issues like how tweaking any parameters effects the
> : overall document scores.  For example, consider the comments about
> : "steepness" related to length norm.  It talks (some) mathematics of the
> : equation, but until one spends some time with that equation and
> : understanding where they all fit together, I doubt it jumps out at most
> : folks what large or smaller values mean for terms and resulting document
> : scores.
> :
> : One obvious hard to tease out part of the Similarity API is when each
> : part is called -- the simplest being index time vs. search time -- there
>
> well ... hopefully the Similarity docs and the the docs on Lucene scoring
> have filled in most of those blanks before you drill down into the
> specifics of how SSS work.  if not, then any concrete improvements you can
> suggest would certainly be apprecaited...
>
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
>
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co
>
>
Chapter 12 Document Ranking in Hibernate Search in Action gives  a 
thorough explanation of Lucene Scoring and the Similarity class which 
Ive found helpful. I think its worth mentioning as not the most obvious 
book for this subject.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message