commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cédrik LIME (JIRA) <>
Subject [jira] Updated: (LANG-285) Wish : method unaccent
Date Mon, 30 Aug 2010 12:18:53 GMT


Cédrik LIME updated LANG-285:

    Attachment: StringUtilsAccents.patch

Fixing "stripAccents" performance issues, and extending the logic to use sun.text.Normalizer
(Java <= 1.5) or ICU4J when Java 6 is unavailable.
Please note that not all "interesting" characters are removed using Unicode decomposition;
notably ligatures and curly quotes remain as is, which may not be what the bug reporter wanted
in fine. See my previous comment for details about ASCII folding.

> Wish : method unaccent
> ----------------------
>                 Key: LANG-285
>                 URL:
>             Project: Commons Lang
>          Issue Type: New Feature
>          Components: lang.*
>            Reporter: Guillaume Coté
>            Priority: Minor
>             Fix For: 3.0
>         Attachments: LANG-285-unaccent-using-Collator.patch, LANG-285.patch,,
StringUtilsAccents.patch, unaccent.patch,
> I would like to add a method that replace accented caracter by unaccented one.  For example,
with the input String "L'été où j'ai dû aller à l'île d'Anticosti commenca tôt", the
method would return "L'ete ou j'ai du aller à l'ile d'Anticosti commenca tot".
> I suggest to call that method unaccent and to add it in StringUtils.
> If we cannot covert all case, the first version could only covert iso-8859-1.
> If you are willing to go forward with that idea, I am willing to contribute a patch.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message