lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] Commented: (LUCENE-2286) enable DefaultSimilarity.setDiscountOverlaps by default
Date Thu, 25 Feb 2010 18:19:27 GMT


Robert Muir commented on LUCENE-2286:

ok, i will commit in a few days if no one objects. In my opinion the backwards break is the
easiest way to go.

in practice it won't hurt existing docs, and if someone is concerned about bad ranking (because
the newly indexed docs suddenly are ranked better), they can turn this off with the boolean
until the get a chance to reindex all docs.

> enable DefaultSimilarity.setDiscountOverlaps by default
> -------------------------------------------------------
>                 Key: LUCENE-2286
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Query/Scoring
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>             Fix For: 3.1
>         Attachments: LUCENE-2286.patch
> I think we should enable setDiscountOverlaps in DefaultSimilarity by default.
> If you are using synonyms or commongrams or a number of other 0-posInc-term-injecting
methods, these currently screw up your length normalization.
> These terms have a position increment of zero, so they shouldnt count towards the length
of the document.
> I've done relevance tests with persian showing the difference is significant, and i think
its a big trap to anyone using synonyms, etc: your relevance can actually get worse if you
don't flip this boolean flag.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message