commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benjamin Bentmann (JIRA)" <>
Subject [jira] Commented: (IO-167) Fix case-insensitive string handling
Date Sat, 24 May 2008 17:07:55 GMT


Benjamin Bentmann commented on IO-167:

bq. I don't believe the FileSystemUtils changes will make any difference to their operation
I'm not sure whether you did not read my mentioned mail post or it just wasn't clear enough,
so I will try to explain again. The correctness of {{FileSystemUtils}} depends on its capability
to correctly detect the underlying OS. This detection is based on recognition of known OS
names which - for resiliency - is intended to be case-insensitive. If you're familar with
the Unicode standard, you will remember that character casing for Non-English languages is
a non-trivial thing. As just one example, the Turkish language defines the lower case form
of "I" to be "ı" (dotless i). In other words, if a JVM runs on the Turkish locale and the
system property "" returns "IRIX", "UNIX", "MPE/IX" or "SOLARIS", the unpatched {{FileSystemUtils}}
will not detect the OS. As a consequence, {{freeSpaceOs()}} fails with an exception.

So when you doubt the patch will make a difference to the operation, is that because you believe
the outlined preconditions will never occur or because an exception doesn't make a difference
to you?

bq. the package-private IOCase convertCase() method is only used by the FilenameUtils's wildcardMatch()
Just one question for my own understanding: Is {{wildcardMatch()}} meant to be platform-dependent?
In other words, would it be considered correct for the method if a call with argument {{IOCase.INSENSITIVE}}
returns different matches based on the user's locale?

bq. it seems wrong to me to hard-code English in principle
"believe", "seems"... with all respect, correctness is nothing about a gut feeling. I have
no problems if somebody proves me wrong, but such a proof must be based on specs, APIs or
otherwise authorative materials.

>From the API docs for [{{String.toLowerCase()}}|]:
bq. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH)

I believe that file names should be understood as locale insensitive strings, as a matter
of interoperability, but that assumption might be wrong.

Using the English locale for the case conversion will not limit the code to ASCII characters,
if this was your concern. It will merely fix the behavior of {{*erCase()}} to platform-independent
conversion rules. If you look at the source code for {{to*erCase()}} you will notice that
is has an {{if}} for the languages "tr", "az" and "lt". The selection of Locale.ENGLISH is
quite arbitrary, Locale.GERMAN or Locale.FRENCH will equally work well, the key point is to
avoid the {{if}} regardless of the user's locale.

Back to Unicode, case conversions can be defined in terms of isolated 1:1 character mappings
or context-sensitive m:n mappings matching some written language. In most cases (e.g. when
you don't want to produce text for human consumption), Java codes seeks for platform-independence
which implies locale-independence. Unicode offers this via the 1:1 character mappings, available
via {{*erCase()}} and {{String.equalsIgnoreCase()}}. If one wants to approximate
this behavior using {{*erCase()}}, one must lock the locale.

> Fix case-insensitive string handling
> ------------------------------------
>                 Key: IO-167
>                 URL:
>             Project: Commons IO
>          Issue Type: Bug
>    Affects Versions: 1.4
>            Reporter: Benjamin Bentmann
>         Attachments: IO-167.patch
> Case-insensitive operations are currently platform-dependent, please see [Common Bug
#3|] for details.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message