On Thu, Apr 14, 2011 at 13:04, William A. Rowe Jr. <wrowe@rowe-clan.net> wrote:
With some multibyte character sets, it may be possible that '/' is one
byte of a multibyte sequence. From a Unix perspective, I presume that
it is always treated a path separator and never treated as a multibyte
combination filename character.

But I just wanted to ask in case anyone is aware of where this might
treated as a valid filename character?


Wikipedia on Shift-JIS (http://en.wikipedia.org/wiki/Shift_JIS) says:

Shift JIS (also SJIS, MIME name Shift_JIS) is a character encoding for the Japanese language originally developed by a Japanese company called ASCII Corporation in conjunction with Microsoft and standardized as JIS X 0208 Appendix 1. It is based on character sets defined within JIS standards JIS X 0201:1997 (for the single-byte characters) and JIS X 0208:1997 (for the double byte characters). The lead bytes for the double byte characters are "shifted" around the 64 halfwidth katakana characters in the single-byte range 0xA1 to 0xDF. The single-byte characters 0x00 to 0x7F match the ASCII encoding, except for a yen sign at 0x5C and an overline at 0x7E in place of the ASCII character set's backslash and tilde respectively. The single-byte characters from 0xA1 to 0xDF map to the half-width katakana characters found in JIS X 0201.

Shift JIS requires an 8-bit clean medium for transmission. It is fully backwards compatible with the legacy JIS X 0201 single-byte encoding, meaning it supports half-width katakana and that any valid JIS X 0201 string is also a valid Shift JIS string. For two-byte characters, however, Shift JIS only guarantees that the first byte will be high bit set (0x800xFF); the value of the second byte can be either high or low. Appearance of byte values 0x400x7E as second bytes of code words makes reliable Shift JIS detection difficult, because same codes are used for ASCII characters. On the other hand, the competing 8-bit format EUC-JP, which does not support single-byte halfwidth katakana, allows for a much cleaner and direct conversion to and from JIS X 0208 code points, as all high bit set bytes are parts of a double-byte character and all codes from ASCII range represent single-byte characters.


Given that the second byte is in the range 0x40..0x7E (second para), and / is 0x2F, there shouldn't be a problem with Shift-JIS. That's not to say there isn't another codeset where there isn't a problem, but I don't think it is Shift-JIS and possibly not any of the main Japanese codesets.


--
Jonathan Leffler <jonathan.leffler@gmail.com> #include <disclaimer.h>
Guardian of DBD::Informix - v2008.0513 - http://dbi.perl.org
"Blessed are we who can laugh at ourselves, for we shall never cease to be amused."