lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marko Asplund (JIRA)" <>
Subject [jira] Created: (LUCENE-1029) Illegal character replacements in ISOLatin1AccentFilter
Date Mon, 15 Oct 2007 07:26:51 GMT
Illegal character replacements in ISOLatin1AccentFilter

                 Key: LUCENE-1029
             Project: Lucene - Java
          Issue Type: Bug
          Components: Analysis
    Affects Versions: 2.2
            Reporter: Marko Asplund

The ISOLatin1AccentFilter class is responsible for replacing "accented characters in the ISO
Latin 1 character set by their unaccented equivalent".

Some of the replacements performed for scandinavian characters (used e.g. in the finnish,
swedish, danish languages etc.) are illegal. The scandinavian characters are different from
the accented characters used e.g. in latin based languages such as french in that these characters
(ä, ö, å) represent entirely independent sounds in the language and therefore cannot be
represented with any other sound without change of meaning. It is therefore illegal to replace
these characters with any other character.

This means for example that you can't change the finnish word sää (weather) to saa (will
have) because these are two entirely different words with different meaning. The same applies
to scandinavian languages as well.

There's no connection between the sounds represented by ä and a; ö and o or å and a. 

In addition to the three characters mentioned above danish and norwegian use other special
characters such as ø and æ. It should be checked if the replacement is legal for these characters.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message