lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <>
Subject Re: best practice: 1.4 billions documents
Date Fri, 26 Nov 2010 14:27:49 GMT
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler <> wrote:
>  (Fuzzy scores on
> MultiSearcher and Solr are totally wrong because each shard uses another
> rewritten query).

Hmmm, really?  I thought that fuzzy scoring should just rely on edit distance?
Oh wait, I think I see - it's because we can use a hard cutoff for the
number of expansions rather than an edit distance cutoff.  If we used
the latter, everything should be fine?

The fuzzy issue I would classify as "working as designed".  Either
that, or classify FuzzyQuery as broken.  A cuttoff based on number of
terms will yield strange results even on a single index.  Consider
this scenario: it's possible to add more docs to a single index and
have the same fuzzy query return fewer docs than it did before!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message