lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bernhard Messer <Bernhard.Mes...@intrafind.de>
Subject Re: FuzzyQuery prefix length
Date Mon, 11 Oct 2004 21:20:09 GMT
Doug Cutting wrote:

> Does anyone have fuzzy-query benchmarks for, e.g., ~1M document 
> indexes, where each document contains a few k of text?  Ideally with 
> such indexes, even complex queries should take less than a second, 
> no?  How long does a fuzzy query take?  And how much does a prefix of 
> zero, one, or two change that?  Queries that take much longer than a 
> second are considerably less usable.  I think the the default should 
> provide good usability for indexes of at least 1M documents. 

i've an index containing about 800k documents, a few kb of text for each 
document. Every lucene doc in the index has about 12 fields. The overall 
index size is about 2.8 GB.

> Another thing to examine is how different the generated terms are with 
> different prefixes.  One could randomly select some words from an 
> index and compute the average amount that a prefix of one and two 
> changes the end results.  My guess is that the changes are small.  
> Since fuzzy search is a heuristic, not an exact computation, good 
> approximations are fair play.
>
If that fits you're need, i can create and run a test for query 
benchmarking.

regards
Bernhard

> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message