lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Question about special characters
Date Thu, 25 May 2006 18:25:16 GMT

I think I'm missing something here.  the whole point of the
ISOLatin1AccentFilter is to replace accented characters with their
unaccented equivalent -- it sounds like that's working just fine, If you
want teh words in teh term vector to contain the accents, why don't you
stop using that filter?

if the problem is that you need to be able to match on both the accented
form and the non accented form, perhaps you should have two fields, or
modify the ISOLatin1AccentFilter so it puts both versions of the token in
the TokenStream with the same position?


: > The problem is special characters like à, ä , ç or ñ latin characters in
: > the text.
: > Now I use iso latin filter, but the problem is when I want to obtain most
: > term used. These term are stored without ` ´ ^ or another "character
: > attribute".
: > For example "plàntïuç" (it isn't a real word) is stored like the term
: > "plantiuc".
: > How can I do to have in term vector the word "plàntïuç".
: >
: > thks for all replies.
: > PD: excuse if this question is solved somewhere, but I don't saw it.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message