lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <>
Subject Re: SweetSpotSimilarity
Date Tue, 06 Mar 2012 22:57:23 GMT
On 05/03/2012 23:24, Robert Muir wrote:
> On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill<>  wrote:
>>> I would definitely not suggest using SSS for fields like legal brief text or
emails where there is huge
>>> variability in the length of the content -- i can't think of any context where
a "short" email is
>>> definitively better/worse then a "long" email.  more traditional TF/IDF seems
like it would make more
>>> sense there.
>> I was coming to a similar conclusion.
>>> well ... hopefully the Similarity docs and the the docs on Lucene scoring have
filled in most of those
>>> blanks before you drill down into the specifics of how SSS work.  if not, then
any concrete
>>> improvements you can suggest would certainly be apprecaited...
>> Thanks for the links.
>> The first thing I notice is that what is listed at the top of Similarity is totally
changed.  Great stuff about the object interaction. For example, I didn't understand how Weight
object fit in until reading that.
>> But I see I got what I asked for.  Someone thought describing the object interaction
was more important than the scoring formula itself.  I chew on it (but I'm currently using
the 3.4 code).
>> My only thought is that the new stuff seems to be at the expense of the formulas
listed in the old class overview for Similarity.
> Hello,
> what is previously Similarity in older releases is moved to
> TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
> with its same formulas in the javadocs:
Looks good, do you know if this stuff will make it into 3.6 ?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message