lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Parkin" <>
Subject How to handle umlauts ?
Date Wed, 18 Sep 2002 20:55:57 GMT
Hello all,

I suspect my answer will involve unicode, but I'd like to make sure that I 
am going down the right path here.

I have 100,000+ small HTML files that are mainly in the english language. I 
just noticed that we have some user names with umlauts. These are seemingly 
stored and searchable as the '?' character.

My code is based on the demo code that is provided with Lucene, under the 
'demo' directory.

I am wondering what changes I will need to make to handle such characters as 
umlauts within english text ?



Join the world’s largest e-mail service with MSN Hotmail.

To unsubscribe, e-mail:   <>
For additional commands, e-mail: <>

View raw message