lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: SweetSpotSimilarity
Date Tue, 06 Mar 2012 22:57:23 GMT
On 05/03/2012 23:24, Robert Muir wrote:
> On Mon, Mar 5, 2012 at 6:01 PM, Paul Hill<paul@metajure.com>  wrote:
>>> I would definitely not suggest using SSS for fields like legal brief text or
emails where there is huge
>>> variability in the length of the content -- i can't think of any context where
a "short" email is
>>> definitively better/worse then a "long" email.  more traditional TF/IDF seems
like it would make more
>>> sense there.
>> I was coming to a similar conclusion.
>>
>>> well ... hopefully the Similarity docs and the the docs on Lucene scoring have
filled in most of those
>>> blanks before you drill down into the specifics of how SSS work.  if not, then
any concrete
>>> improvements you can suggest would certainly be apprecaited...
>>>
>>> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/index.html
>>> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/Similarity.html
>>>
>>> https://svn.apache.org/viewvc/lucene/dev/trunk/lucene/site/build/site/scoring.html?view=co
>> Thanks for the links.
>> The first thing I notice is that what is listed at the top of Similarity is totally
changed.  Great stuff about the object interaction. For example, I didn't understand how Weight
object fit in until reading that.
>> But I see I got what I asked for.  Someone thought describing the object interaction
was more important than the scoring formula itself.  I chew on it (but I'm currently using
the 3.4 code).
>>
>> My only thought is that the new stuff seems to be at the expense of the formulas
listed in the old class overview for Similarity.
> Hello,
>
> what is previously Similarity in older releases is moved to
> TFIDFSimilarity: it extends Similarity and exposes a vector-space API,
> with its same formulas in the javadocs:
> https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-trunk/javadoc/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html
>
Looks good, do you know if this stuff will make it into 3.6 ?

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message