couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noah Diewald <noah.diew...@gmail.com>
Subject Re: CouchDB View Unicode Document
Date Thu, 28 Apr 2011 21:57:44 GMT
> Can someone paste some actual input/output pairs so I have a clue
> what's going on.
>
> Theoretically \uFFFF isn't a valid escape sequence last I checked
> (don't get me started on 4627 idiocy).
>
> The JSON encoder will by default escape data that is non-printable
> ascii. The few special cased characters mentioned in the JSON spec are
> backslash escaped (\t \n \" etc) while All other bits are escaped as
> \uHHHH sequences.

What you're describing is what I'm seeing. I don't think it is a bug,
just something I don't like because it isn't taking advantage of the
benefits of unicode. I'd rather see the characters instead of \uHHHH
sequences. For instance I get "\u00e9" for "é". I guess the JSON spec
says that any character can be escaped but characters in the basic
multilingual plane don't need to be because the string is utf8. I
guess I feel that the benefit of utf8 is supposed to be that escaping
these characters isn't necessary but that they'll appear in an easily
human readable form. I think from what you said above that I'm not
experiencing anything that is unexpected but I can supply some input
and output if it is.

-- 
Noah Diewald
noah.diewald.me
noahsarchive.net

Mime
View raw message