lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Speed of fuzzy searches
Date Thu, 02 Apr 2009 16:49:25 GMT

Try setting the minimum prefix length for fuzzy queries ( I think there is a setting on QueryParser
or you may need to subclass)

Prefix length of zero does edit distance comparisons for all unique terms e.g. from "aardvark"
to "zzzz"
Prefix length of one would cut this search space down to just terms "car" to "czar"

- you should get the picture. Massive reductions in CPU usage at each increment of prefix
length but you need to balance that with the inability to match "cow" with "kow".

Cheers
Mark



----- Original Message ----
From: Matt Schraeder <MSchraeder@btsb.com>
To: java-user@lucene.apache.org
Sent: Thursday, 2 April, 2009 17:16:57
Subject: Speed of fuzzy searches

I've got a simple Lucene index and search built for testing purposes. 
So far everything seems great. Most searches take 0.02 seconds or less.
Searches with 4-5 terms take 0.25 seconds or less.  However, once I
began playing with fuzzy searches everything seemed to really slow down.
A fuzzy search seems to take vastly longer time, 6 seconds for a single
term such as "cow~" and 24 seconds for fuzzy searches of multiple
terms.

Is there anything I can do to speed up fuzzy searches or are they by
default just simply slow?  

My index is only 6.1M, with ~18000 documents.  Each document has 5
fields, a combination of text and keywords. I'm afraid that when I begin
to scale up to have more fields it will only make the problem worse.



      


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message