couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <d...@muse.net.nz>
Subject Re: when will utf8 handling be fixed?
Date Wed, 08 Jun 2011 19:32:36 GMT
Thanks Jim,

nice tip which I was not aware of!

A+
Dave

On 9 June 2011 07:28, Jim Klo <jim.klo@sri.com> wrote:
> One problem that often bites me - someone forgets to include the UTF-8
> charset in the Content-Type header.  Missing that can often mangle the
> handling of high byte characters.
> When setting your Content-Type with curl this is often done something like:
> curl -H "Content-Type: application/json; charset=utf-8" ....
> Jim Klo
> Senior Software Engineer
> Center for Software Engineering
> SRI International
>
>
>
> On Jun 8, 2011, at 9:35 AM, Paul Davis wrote:
>
> On Wed, Jun 8, 2011 at 12:32 PM, MK <mk@cognitivedissonance.ca> wrote:
>
> Is there any intention to fix couch's handling of "unusual" unicode
>
> characters?  One of the "unusual" characters is the right single quote
>
> (226,128,153) which is a valid utf8 character and also not very
>
> "unusual" IMO.
>
> I have an interface which allows users to add and edit text in a db
>
> document (again, not very unusual) and this one came up because of
>
> someone cutting and pasting some text from a source which used the
>
> right single quote as an apostrophe (which is just plain common -- in
>
> fact they are used in the online "Definitive Guide").
>
> So I am having to maintain a switch statement which filters out these
>
> characters and replaces them with html entities before they get sent
>
> to couch, which is okay in my case since the documents are just being
>
> used as html pages anyway.
>
> But it's an awkward and unnecessary solution: individual
>
> developers should not have to be dealing with this, proper utf8
>
> handling should be hard coded into couch.   For one thing, it means that
>
> anyone worried about such "unusual" possibilities cannot use
>
> couchapp or couch directly -- data has to be filtered first server side.
>
> Although spidermonkey handles utf8 fine, depending on client side
>
> filtering is not always an alternative.
>
> Sincerely, MK
>
> --
>
> "Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
>
> "The angel of history[...]is turned toward the past." (Walter Benjamin)
>
>
>
> What version of CouchDB are you using and what is an actual request look
> like?
>
> A recent check on trunk shows both decoders handle your case fine:
>
> 1> mochijson2:decode(<<"\"", 226,128,153, "\"">>).
> <<226,128,153>>
> 2> ejson:decode(<<"\"", 226,128,153, "\"">>).
> <<226,128,153>>
> 3>
>
>

Mime
View raw message