lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: accented characters, wildcards and other problems
Date Thu, 13 Jul 2006 16:53:58 GMT
Bok Tomi,

What do you mean by "terms are misrepresented"?  What should they be, and what are you seeing?

> What I'm not clear on is how can I see the problematic *terms* in the list of terms,
but not the documents they're stored in?

Are you saying that the content got indexed, but the file names did not?

Out of curiosity (note my last name), I'm curious about what analyzer/tokenizer you're using.
 Is there an equivallent of Porter stemmer for Croatian?  I could use that. :)

Otis

----- Original Message ----
From: Tomi NA <hefest@gmail.com>
To: java-user@lucene.apache.org
Sent: Thursday, July 13, 2006 8:19:31 AM
Subject: accented characters, wildcards and other problems

I've done a bit of testing with accented characters (Croatian, to be
specific) and can't really explain what I see when I explore the index
with luke.
I've used accented characters in directory names, file names and file contents.
Now, in the list of terms (in "Top ranking terms", "Overview" tab) I
see that 2 out of 5 terms are misrepresented, but are indexed,
nonetheless.
The file names containing the problematic characters contain these
characters themselves, i.e. if the file name is "file[x].txt", the
file contents are "test[x]", where [x] represents the accented
character. What I'm not clear on is how can I see the problematic
*terms* in the list of terms, but not the documents they're stored in?

That's one issue. The other is somewhat simpler, I expect.
A search for "test*" returns no results. Acording to the FAQ, it
should, so what am I missing?

t.n.a.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message