lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject RE: SweetSpotSimilarity
Date Thu, 16 Feb 2012 04:36:54 GMT

: sloppyFreq(distance). hyperbolicTf() only comes into play if you 
: override the tf method in your own subclass to call it instead of the 
: baselineTf which it normally calls.  I also didn't get what it was 
: trying to do.

Correct, as documented...

http://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/contrib-misc/org/apache/lucene/misc/SweetSpotSimilarity.html

"For tf, baselineTf and hyperbolicTf functions are provided, which 
subclasses can choose between."

tf() ... "Delegates to baselineTf"

hyperbolicTf ... "This code is provided as a convenience for subclasses 
that want to use a hyperbolic tf function."

As for what hyperbolicTf is trying to do ... it creates a hyperbolic 
function letting you specify a hard max no matter how many terms there 
are.

: > And I am aware that SweetSpotSimilarity resulted from this paper
: > 
: > http://trec.nist.gov/pubs/trec16/papers/ibm-haifa.mq.final.pdf

For the record, that paper did not result in SSS -- I wrote SSS ~Dec 2005 
and contributed it to Apache a few months later on behalf of CNET Networks 
where i developed it to solve some specific problems we had with 
product data...

https://issues.apache.org/jira/browse/LUCENE-577
http://mail-archives.apache.org/mod_mbox/lucene-dev/200605.mbox/%3CF9F270C4-FA1E-460F-A54F-E2E56AAD0286%40rectangular.com%3E
(and subsequent replies)

...Doron wrote the paper later, although you'll note lots of dicsussions 
arround that time on the mailing list about customizing Similarity based 
on domain specific data -- the concepts certainly weren't novel.

: > However, I was wondering if there was a resource that explained (and gave examples)
of how SSS
: > works and what each parameter (hyperbolic, etc) means. I know this is a Lucene list
but I am actually

The functions are pretty clearly spelled out in the javadocs -- you just 
set the options on the class to control the constant values of the 
functions.  The easiest way to understand them is probably to use 
something like gnuplot to graph them using various values for the 
constants, and then compare to graphs of the corrisponding functions from 
DefaultSimilarity.




-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message