couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Character encodings and JSON RFC (spun off from COUCHDB-345)
Date Sun, 30 Aug 2009 04:48:36 GMT
>  * The editor of RFC 4627 was high.

Just letting this one hang out for awhile now.

>> It still uses the troublesome meme "character encoding ... Unicode",
>> however it seems to be a stretch to read that and think that Shift-JIS,
>> ISO-8559-8, MacLatin, EBCDIC, etc are also fine and dandy.
> The RFC demonstrates conclusively that the only allowable encodings are:
>  UTF-8, UTF-16, or UTF-32

I missed this. They say fairly explicitly "JSON text SHALL be encoded
in Unicode." To me that says that if I create my own PJD-15.3 Unicode
encoding, a compliant JSON parser must support it. Obviously not gonna
happen, so we should ignore the JSON spec, and rely on our HTTP
prowess to make sure we only feed UTF-8 to the JSON parser.

>   Since the first two characters of a JSON text will always be ASCII
>   characters [RFC0020], it is possible to determine whether an octet
>   stream is UTF-8, UTF-16 (BE or LE), or UTF-32 (BE or LE) by looking
>   at the pattern of nulls in the first four octets.

Am I the only that has contemplated whether this means the
byte-representation of the JSON object, vs every JSON string in the

> ISO-8859-1 JSON is invalid JSON.

Forgive my ignorance, but I thought ISO-8859-1 was valid UTF-8. Or is
it just in the common no-high-bit-set ASCII that's UTF-8 compatible?
Either way, ISO-8859-1 is an encoding, not a content-type. Isn't
saying it's invalid JSON like saying UC-4 is invalid JPEG? XD

View raw message