couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Character encodings and JSON RFC (spun off from COUCHDB-345)
Date Sun, 30 Aug 2009 05:46:26 GMT
On Sun, Aug 30, 2009 at 1:42 AM, Noah Slater<nslater@apache.org> wrote:
> On Sun, Aug 30, 2009 at 01:27:18AM -0400, Paul Davis wrote:
>> I've never seen a self contained JSON parser that is compliant with
>> anything other than UTF-8. You could argue that Python's is, but it
>> forces all input to it's internal Unicode representation AFAIK.
>
> Yeah, well, software sucks. Init?
>
>> > I vote for the first option.
>>
>> Patches welcome. :)
>
> I prefer being a Unicode snob on the mailing lists, kthx.
>
>> I thought you said on IRC that the RFC's detection scheme only works
>> if the BOM is specified which is non mandatory. If it's not mandatory
>> then it'd be a guess. Even if the major encodings can be determined
>> I'd invent an encoding spec just to prove its still a guess.
>
> The JSON RFC has a fool proof method that doesn't involve the BOM.
>
> I quoted this earlier in the thread.
>

And I thought I quoted you quoting something about quotes when I said
you said that foolproof method was about as fool proof as a wet cloth.
Granted I could've mis intrepreted. Curt had a link to the XML style
detection which also seemed to "when clients do things that clients do
you have to break down and try other stuff" and the end result was
"maybe possibly utf-8 but not certain".

>> UTF-8 obviously. For 16 and 32 we can obviously only accept BE
>> variants since it was sent via HTTP.
>
> I hope you didn't just use BE to mean British English.

No! Since it was transfered across a network, obviously we can only
accept the big-endian variants.

Mime
View raw message