lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tomi NA" <>
Subject accented characters, wildcards and other problems
Date Thu, 13 Jul 2006 12:19:31 GMT
I've done a bit of testing with accented characters (Croatian, to be
specific) and can't really explain what I see when I explore the index
with luke.
I've used accented characters in directory names, file names and file contents.
Now, in the list of terms (in "Top ranking terms", "Overview" tab) I
see that 2 out of 5 terms are misrepresented, but are indexed,
The file names containing the problematic characters contain these
characters themselves, i.e. if the file name is "file[x].txt", the
file contents are "test[x]", where [x] represents the accented
character. What I'm not clear on is how can I see the problematic
*terms* in the list of terms, but not the documents they're stored in?

That's one issue. The other is somewhat simpler, I expect.
A search for "test*" returns no results. Acording to the FAQ, it
should, so what am I missing?


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message