couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Character encodings and JSON RFC (spun off from COUCHDB-345)
Date Sun, 30 Aug 2009 05:27:18 GMT
>  Can UTF-8 represent all possible Unicode code points?
> To which I would have replied "yes."

You got it.

> Any conformant JSON parser should accept UTF-8, UTF-16, and UTF-36. I mean, it's
> right there in the spec as a requirement, along with rocking horse people,
> eating marshmelow pies.

I've never seen a self contained JSON parser that is compliant with
anything other than UTF-8. You could argue that Python's is, but it
forces all input to it's internal Unicode representation AFAIK.

> If we support JSON, we either:
>  * Support UTF-8, UTF-16, and UTF-36 per the insane spec.
>  * Willfully ignore the spec and require UTF-8.
> I vote for the first option.

Patches welcome. :)

>> Assume UTF-8. If fail, maybe try guessing. If fail, throw a meatball
>> at the client saying rejected. We already ignore quite a few headers
>> and do things "Non-RESTful-ly" so I'm not too concerned.
> No need to guess. The RFC has a mechanism for determining the encoding. It also
> specifies a way to indicate the encoding. Which is totally not redundant, in any
> way. Thankfully, it doesn't tell us how to handle conflicting information in
> this respect, because of course, that would be too easy. Software is no fun when
> the specifications make things obvious.

I thought you said on IRC that the RFC's detection scheme only works
if the BOM is specified which is non mandatory. If it's not mandatory
then it'd be a guess. Even if the major encodings can be determined
I'd invent an encoding spec just to prove its still a guess.

>> > 2. How to treat a JSON request with a specified Content-Encoding.
>> If the encoding is understood, transcode to a UTF-8 representation.
> If it is explicit, and it is wrong, vomit in their face.

FOUURRRROHHHSIXXXXX. Oh, pardon me. Late night last night.

>> > What encodings would be supported?
>> Patches welcome. UTF-8 currently kinda sort supported.
> UTF-8, UTF-16, and UTF-36

UTF-8 obviously. For 16 and 32 we can obviously only accept BE
variants since it was sent via HTTP.

> I would prefer vomiting, but any kind of humiliation works for me.

lol. More awesome quotes plzkthx.

Paul Davis

View raw message