lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Wed, 03 Sep 2008 21:04:56 GMT

On Sep 3, 2008, at 3:00 PM, Michael McCandless wrote:
> Obviously we can't default everything perfectly since at some point
> there are hard tradeoffs to be made and every app is different, but if
> SweetSpotSimilarity really gives better relevance for many/most apps,
> and doesn't have any downsides (I haven't looked closely myself), I
> think we should get it into core?

Well, we only have 2 data points here:  Hoss' original position that  
it was helpful, and Doron's Million Query work.  Has anyone else  
reported benefit?  And in that regard, the difference between OOTB and  
SweetSpot was 0.154 vs. 0.162 for MAP.  Not a huge amount, but still  
useful.  In that regard, there are other length normalization  
functions (namely approaches that don't favor very short documents as  
much) that I've seen benefit applications as well, but as Erik is  
(in)famous for saying "it depends".  In fact, if we go solely based on  
the million query work, we'd be better off having the Query Parser  
create phrase queries automatically for any query w/ more than 1 term  
(0.19 vs 0.154) before we even touch length normalization.

I've long argued that Lucene needs to take on the relevance question  
more head on, and in an open source way, until then, we are merely  
guessing at what's better, w/o empirical evidence that can be easily  
reproduced.   TREC is just one data point, and is often discounted as  
being all that useful in the real world.

I'm on the fence, though.  I agree w/ Hoss that core should be "core"  
and I don't think we want to throw more and more into core, but I also  
agree w/ Mike in that we want good, intelligent defaults for what we  
do have in core.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message