lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Wiggin" <danut...@gmail.com>
Subject Re: Question about special characters
Date Fri, 26 May 2006 07:31:23 GMT
Thks for the reply, ut I don't know how to do this change in
SOLatin1AccentFilter.
Can you give me some advice in this action?

2006/5/25, Chris Hostetter <hossman_lucene@fucit.org>:
>
>
> I think I'm missing something here.  the whole point of the
> ISOLatin1AccentFilter is to replace accented characters with their
> unaccented equivalent -- it sounds like that's working just fine, If you
> want teh words in teh term vector to contain the accents, why don't you
> stop using that filter?
>
> if the problem is that you need to be able to match on both the accented
> form and the non accented form, perhaps you should have two fields, or
> modify the ISOLatin1AccentFilter so it puts both versions of the token in
> the TokenStream with the same position?
>
>
> : > The problem is special characters like à, ä , ç or ñ latin characters
> in
> : > the text.
> : > Now I use iso latin filter, but the problem is when I want to obtain
> most
> : > term used. These term are stored without ` ´ ^ or another "character
> : > attribute".
> : > For example "plàntïuç" (it isn't a real word) is stored like the term
> : > "plantiuc".
> : > How can I do to have in term vector the word "plàntïuç".
> : >
> : > thks for all replies.
> : > PD: excuse if this question is solved somewhere, but I don't saw it.
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message