lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Sokolov <soko...@ifactory.com>
Subject Re: WELCOME to java-user@lucene.apache.org
Date Sun, 03 Jul 2011 16:05:27 GMT
You should take a look at org.apache.solr.analysis.MappingCharFilter, 
which provides a generic table-based approach for use with solr.  There 
are also a lot of other interesting CharFilters in the same package.  
For lucene-only use, there's 
org.apache.lucene.analysis.icu.ICUFoldingFilter, which looks very 
promising, and I'm sure there are others I don't know about.

-Mike

On 7/3/2011 8:38 AM, Jan Rothhaar wrote:
> Hi everybody,
>
> I have a pretty generic question about token filters, and I am not really sure whether
it is a developer or a configuration question:
>
> How exactly do I make lucene map letters to each other, e.g. make it treat both 'a' and
'á' as one and the same letter, or both '写' and '寫' one and the same character? I am
sure this question has appeared before and there are sample implementations or sample configuration
files out there, but I could not find them on my own.
>
> I only need to map single letters (i.e. no 'oe'<=>  'ö'), but in a multi-byte
charset. I have some modest experience in programming in java, but am far from being a guru.
>
> Any help is appreciated.
>
> Thanks in advance,
>
> Jan


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message