lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: i18n query normalization
Date Tue, 23 Aug 2005 17:38:12 GMT

>    We have a multi-languaged index and we need to match accented
>characters with non accented characters. For example, if a document
>contains: mângão, the query: mangao should match it.
>
>     I guess I would have to build some sort of analyzer/tokenizer for this.
>
>     I was wondering if there are tokenizers already built for lucene.

Search the archives for a discussion about this, 
back in June I believe. I'd suggested using ICU 
to generate sort keys, and indexing those.

-- Ken
-- 
Ken Krugler
TransPac Software, Inc.
<http://www.transpac.com>
+1 530-470-9200

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message