lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Hill <p...@metajure.com>
Subject RE: SweetSpotSimilarity
Date Mon, 05 Mar 2012 23:01:24 GMT
> I would definitely not suggest using SSS for fields like legal brief text or emails where
there is huge
> variability in the length of the content -- i can't think of any context where a "short"
email is
> definitively better/worse then a "long" email.  more traditional TF/IDF seems like it
would make more
> sense there.

I was coming to a similar conclusion.

> well ... hopefully the Similarity docs and the the docs on Lucene scoring have filled
in most of those
> blanks before you drill down into the specifics of how SSS work.  if not, then any concrete
> improvements you can suggest would certainly be apprecaited...
> 
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
> 
> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co

Thanks for the links.  
The first thing I notice is that what is listed at the top of Similarity is totally changed.
 Great stuff about the object interaction. For example, I didn't understand how Weight object
fit in until reading that.
But I see I got what I asked for.  Someone thought describing the object interaction was more
important than the scoring formula itself.  I chew on it (but I'm currently using the 3.4
code).

My only thought is that the new stuff seems to be at the expense of the formulas listed in
the old class overview for Similarity.
http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/search/Similarity.html
I would think that some of the old math, particularly the formula as it corresponds to the
methods, would still be useful information even if I can't claim to know where it might be
placed.

Maybe something like the site scoring page could talk how the arithmetic maps to the methods
and how phrase scoring messes with scoring.
Just my $0.02

thanks

-Paul


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message