commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Poulard <>
Subject [VFS] URI normalization
Date Wed, 10 Aug 2005 12:36:26 GMT
Should VFS normalize URIs before parsing a file name ?

--- URI normalization ---

URI references require encoding and escaping of certain characters. The 
disallowed characters include all non-ASCII characters, plus the 
excluded characters listed in Section 2.4 of [RFC 2396], except for the 
number sign (#) and percent sign (%) characters and the square bracket 
characters re-allowed in [RFC 2732].
The set of excluded US-ASCII characters is :
  [00-20]    [22] [3C] [3E] [5C] [5E] [60] [7B-7D] [7F]
   C0  SPACE   "    <    >    \    ^    `   { | }   DEL

Escaping disallowed characters is performed as follows:
1. Each disallowed character is converted to UTF-8 [RFC 2279] as one or 
more bytes.
2. Any octets corresponding to a disallowed character are escaped with 
the URI escaping mechanism (that is, converted to %HH, where HH is the 
hexadecimal notation of the octet value). If escaping must be performed, 
uppercase hexadecimal characters should be used.
3. The original character is replaced by the resulting character sequence.
Note that this normalization process is idempotent: repeated 
normalization does not change a normalized URI reference.


           (. .)
|   Philippe Poulard    |

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message