lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael Tobias" <mich...@tobias.org.uk>
Subject Fuzzy Searching on Lucene / Solr
Date Wed, 14 Aug 2013 05:00:22 GMT
My first post so please be gentle with me.

I am about to start 'playing' with Solr to see if it will be the correct
tool for a new searchable database development.  One of my requirements is
the ability to do 'fuzzy' searches and I understand that the latest versions
of Lucene / Solr use an improved version of indexing and the Levenshtein
distance formula (or rather a modified version of Levenshtein if wished for,
treating letter transpositions as a single difference rather than 2).

Levenshtein is precisely what I need, but I also understand that the maximum
distance currently implemented is a distance of just TWO.  That is not
really adequate for my purposes.  I need to be able to handle at least a
distance of 3 and probably 4.

Is the current maximum distance of 2 hard-coded in the system?  Can it be
over-ridden?  How?

I understand that performance (both indexing and querying) may be impaired
significantly by doing this but that might be a price worth paying.  If it
IS possible to change the max distance to 3 or 4 does anybody have any idea
what the performance implications might be?

Many thanks for any/all assistance you can provide.

Regards

Michael


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message