lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen" <>
Subject Re: Moving SweetSpotSimilarity out of contrib
Date Wed, 03 Sep 2008 22:20:17 GMT
My thought was to move SSS to core as a step towards
making it the default, if and when there is more evidence it is
better than current default - it just felt right as a cautious
step - I mean first move it to core so that it is more exposed
and used, an only after a while, maybe, if there are mostly
positive evidences, make it the default.

On Thu, Sep 4, 2008 at 12:04 AM, Grant Ingersoll <>wrote:

> On Sep 3, 2008, at 3:00 PM, Michael McCandless wrote:
>> Obviously we can't default everything perfectly since at some point
>> there are hard tradeoffs to be made and every app is different, but if
>> SweetSpotSimilarity really gives better relevance for many/most apps,
>> and doesn't have any downsides (I haven't looked closely myself), I
>> think we should get it into core?
> Well, we only have 2 data points here:  Hoss' original position that it was
> helpful, and Doron's Million Query work.  Has anyone else reported benefit?
>  And in that regard, the difference between OOTB and SweetSpot was 0.154 vs.
> 0.162 for MAP.  Not a huge amount, but still useful.  In that regard, there
> are other length normalization functions (namely approaches that don't favor
> very short documents as much) that I've seen benefit applications as well,
> but as Erik is (in)famous for saying "it depends".  In fact, if we go solely
> based on the million query work, we'd be better off having the Query Parser
> create phrase queries automatically for any query w/ more than 1 term (0.19
> vs 0.154) before we even touch length normalization.
> I've long argued that Lucene needs to take on the relevance question more
> head on, and in an open source way, until then, we are merely guessing at
> what's better, w/o empirical evidence that can be easily reproduced.   TREC
> is just one data point, and is often discounted as being all that useful in
> the real world.
> I'm on the fence, though.  I agree w/ Hoss that core should be "core" and I
> don't think we want to throw more and more into core, but I also agree w/
> Mike in that we want good, intelligent defaults for what we do have in core.
> -Grant
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

View raw message