lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: best practice: 1.4 billions documents
Date Fri, 26 Nov 2010 14:27:49 GMT
On Mon, Nov 22, 2010 at 12:49 PM, Uwe Schindler <uwe@thetaphi.de> wrote:
>  (Fuzzy scores on
> MultiSearcher and Solr are totally wrong because each shard uses another
> rewritten query).

Hmmm, really?  I thought that fuzzy scoring should just rely on edit distance?
Oh wait, I think I see - it's because we can use a hard cutoff for the
number of expansions rather than an edit distance cutoff.  If we used
the latter, everything should be fine?

The fuzzy issue I would classify as "working as designed".  Either
that, or classify FuzzyQuery as broken.  A cuttoff based on number of
terms will yield strange results even on a single index.  Consider
this scenario: it's possible to add more docs to a single index and
have the same fuzzy query return fewer docs than it did before!

-Yonik
http://www.lucidimagination.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message