lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Juan Pablo Morales" <>
Subject Storing special characters in Lucene
Date Thu, 21 Aug 2008 17:16:50 GMT
I have an index in Spanish and I use Snowball to stem and analyze and it
works perfectly. However, I am running into trouble storing (not indexing,
only storing) words that have special characters.

That is, I store the special character but the it comes garbled when I read
it back.
To provide an example:

String content = "niños";
document.add(new Field("name",content,Store.YES, Index.Tokenized));
writer.addDocument(doc, new SnowballAnalyzer("Spanish"));
When I read the field back
String nombre = doc.get("name");

Then name will contain "ni�os"

Looking at the index with Luke it shows me "ni&#65533;os" but when I want to
see the full text (by right clicking) it shows me ni�os.

I know Lucene is supposed to store fields in UTF8, but then, how can I make
sure I sotre something and get it back just as it was, including special

Juan Pablo Morales
Ingenian Software ltda
Bogotá, Colombia
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message