lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrzej Bialecki ...@getopt.org>
Subject Re: combining open office spellchecker with Lucene
Date Thu, 09 Sep 2004 16:15:08 GMT
David Spencer wrote:
> 
> I can/should send the code out. The logic is that for any terms in a 
> query that have zero matches, go thru all the terms(!) and calculate the 
> Levenshtein string distance, and return the best matches. A more 
> intelligent way of doing this is to instead look for terms that also 
> match on the 1st "n" (prob 3) chars.

...or prepare in advance a fast lookup index - split all existing terms 
to bi- or trigrams, create a separate lookup index, and then simply for 
each term ask a phrase query (phrase = all n-grams from an input term), 
with a slop > 0, to get similar existing terms. This should be fast, and 
you could provide a "did you mean" function too...

-- 
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message