lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Noll <dan...@nuix.com>
Subject Re: double metaphone for misspellings
Date Thu, 18 Dec 2008 05:44:38 GMT
Geoff Hendrey wrote:
> ((POINameType)name).getText().split("\\s"); //tokenize manually. (gosh,
> I thought the analyser would do this)

The analyser does do this... but related to this, the Right Way to do it 
in your case would be to write your own analyser specifically for that 
field, and do all the metaphone magic in the analyser.

You probably want an analyser which chains onto the StandardAnalyzer and 
adds an additional token filter to do the double metaphone stuff. 
Writing a token filter isn't too hard, PorterStemFilter is a relatively 
good example of doing something similar.

So you would end up with a DoubleMetaphoneFilter, which you could then 
use with PerFieldAnalyzerWrapper to have it apply only to the fields you 
use that for.

Daniel


-- 
Daniel Noll                            Forensic and eDiscovery Software
Senior Developer                              The world's most advanced
Nuix                                                email data analysis
http://nuix.com/                                and eDiscovery software

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message