lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: FuzzyQuery prefix length
Date Mon, 11 Oct 2004 16:20:54 GMT
Daniel Naber wrote:
> I agree that the default should stay 0, even for Lucene 2.0.

It should certainly stay zero for 1.4.x releases.

However 2.0 is our opportunity to make incompatible changes.  What is 
the best default for this, that will work well for the most applications?

Does anyone have fuzzy-query benchmarks for, e.g., ~1M document indexes, 
where each document contains a few k of text?  Ideally with such 
indexes, even complex queries should take less than a second, no?  How 
long does a fuzzy query take?  And how much does a prefix of zero, one, 
or two change that?  Queries that take much longer than a second are 
considerably less usable.  I think the the default should provide good 
usability for indexes of at least 1M documents.

Another thing to examine is how different the generated terms are with 
different prefixes.  One could randomly select some words from an index 
and compute the average amount that a prefix of one and two changes the 
end results.  My guess is that the changes are small.  Since fuzzy 
search is a heuristic, not an exact computation, good approximations are 
fair play.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message