lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: [jira] Updated: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter
Date Tue, 16 Oct 2007 17:50:37 GMT
Mark Miller wrote:
> My point was that wikipedia (the link i gave and other definitions I 
> saw) seem to refer to the little markings around a letter as 
> diacriticals whether they mean the letter is a completely different 
> letter or not (see the part mentioning Scandinavian, as well as possibly 
> Websters dictionary). Marko disputed this in his last comment, and I 
> don't know that he is wrong. All I have seen seems to indicate this though.

It is confusing.


   A diacritical mark or diacritic, also called an accent, is a small
   sign added to a letter to alter pronunciation or to distinguish
   between similar words.

In Swedish these are not added to a letter: they're part of the letter, 
so they're not diacritics.  Later in the page it says:

   The Scandinavian languages, by contrast, treat the characters with
   diacritics ä, ö and å as new and separate letters of the alphabet,
   and sort them after z.

Perhaps they could more properly say something like, "Scandinavian 
languages treat as separate letters things that other languages consider 
letters with diacritics".

Webster defines a diactritic as:

   a mark near or through an orthographic or phonetic character or
   combination of characters indicating a phonetic value different
   from that given the unmarked or otherwise marked element

Which points to the diacritic as a marker, but in Swedish the dots are 
no more a marker than the upright on a 'b' is a marker to pronounce it 
differently than an 'o'.

Ah, it's fun to be pedantic in the morning!


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message