lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: [jira] Updated: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter
Date Tue, 16 Oct 2007 17:13:32 GMT
I feel like a fool continuing this debate, being the least intelligent 
guy in the room, but here goes:

My point was that wikipedia (the link i gave and other definitions I 
saw) seem to refer to the little markings around a letter as 
diacriticals whether they mean the letter is a completely different 
letter or not (see the part mentioning Scandinavian, as well as possibly 
Websters dictionary). Marko disputed this in his last comment, and I 
don't know that he is wrong. All I have seen seems to indicate this though.

I also dispute this sentence in the new javadoc patch proposed:

*It will also be impossible to search for the word in its original form.*

If you use the same analyzer at search and query time, there should be no such problem.

Doug Cutting wrote:
> Mark Miller wrote:
>> I wouldn't pretend to know the truth on this matter, but you might 
>> update the wikipedia article 
>> if you do, as it does not agree with your comments.
> Wikipedia says, "Swedish uses characters identical to a-diaeresis (ä) 
> and o-diaeresis (ö)".  This is a little ambiguous.  Identical how?  I 
> think they mean "visually identical to".  The distinction is whether 
> Swedish treats 'ä' as a variant of 'a' or as a completely separate 
> letter.  The latter is the case.
> states:
>   Swedish [...] treat[s] them as independent letters.
> Doug
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message