couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: CouchDB View Unicode Document
Date Thu, 28 Apr 2011 22:19:16 GMT
On Thu, Apr 28, 2011 at 5:57 PM, Noah Diewald <noah.diewald@gmail.com> wrote:
>> Can someone paste some actual input/output pairs so I have a clue
>> what's going on.
>>
>> Theoretically \uFFFF isn't a valid escape sequence last I checked
>> (don't get me started on 4627 idiocy).
>>
>> The JSON encoder will by default escape data that is non-printable
>> ascii. The few special cased characters mentioned in the JSON spec are
>> backslash escaped (\t \n \" etc) while All other bits are escaped as
>> \uHHHH sequences.
>
> What you're describing is what I'm seeing. I don't think it is a bug,
> just something I don't like because it isn't taking advantage of the
> benefits of unicode. I'd rather see the characters instead of \uHHHH
> sequences. For instance I get "\u00e9" for "é". I guess the JSON spec
> says that any character can be escaped but characters in the basic
> multilingual plane don't need to be because the string is utf8. I
> guess I feel that the benefit of utf8 is supposed to be that escaping
> these characters isn't necessary but that they'll appear in an easily
> human readable form. I think from what you said above that I'm not
> experiencing anything that is unexpected but I can supply some input
> and output if it is.
>
> --
> Noah Diewald
> noah.diewald.me
> noahsarchive.net
>

You are exactly correct. I think the general fear with escaping UTF-8
is to make it easier for the JSON to pass through broken
implementations that don't pay attention to possible UTF-8 in string
data. It's possible to throw make that sort of thing configurable but
that would entail quite a bit of consideration on a couple different
fronts.

Mime
View raw message