lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mark harwood <markharw...@yahoo.co.uk>
Subject Re: Funny results with Fuzzy
Date Tue, 25 Oct 2005 17:13:47 GMT
> One thing I was thinking of doing was checking the
> character frequency 

An alternative idea is index-time fuzzification rather
than query-time. This is documented in one of the case
studies in LIA - the principle is you don't
index/search for whole words but use an NGram Analyzer
to break them up at index time:

Kylie becomes multiple words:
[ k]
[ ky]
[ kyl]
[ky]
[kyl]
[kyli]
[yl]
[yli]
[ylie]
[ kylie ]

Obviously you use the same analyzer to process
queries.
Lucene will automatically look after relevancy of
partial matches for you but your indexes are bigger
and your queries will generate many more Boolean
clauses.





	
	
		
___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message