couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: when will utf8 handling be fixed?
Date Wed, 08 Jun 2011 16:35:57 GMT
On Wed, Jun 8, 2011 at 12:32 PM, MK <mk@cognitivedissonance.ca> wrote:
> Is there any intention to fix couch's handling of "unusual" unicode
> characters?  One of the "unusual" characters is the right single quote
> (226,128,153) which is a valid utf8 character and also not very
> "unusual" IMO.
>
> I have an interface which allows users to add and edit text in a db
> document (again, not very unusual) and this one came up because of
> someone cutting and pasting some text from a source which used the
> right single quote as an apostrophe (which is just plain common -- in
> fact they are used in the online "Definitive Guide").
>
> So I am having to maintain a switch statement which filters out these
> characters and replaces them with html entities before they get sent
> to couch, which is okay in my case since the documents are just being
> used as html pages anyway.
>
> But it's an awkward and unnecessary solution: individual
> developers should not have to be dealing with this, proper utf8
> handling should be hard coded into couch.   For one thing, it means that
> anyone worried about such "unusual" possibilities cannot use
> couchapp or couch directly -- data has to be filtered first server side.
> Although spidermonkey handles utf8 fine, depending on client side
> filtering is not always an alternative.
>
> Sincerely, MK
>
> --
> "Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
> "The angel of history[...]is turned toward the past." (Walter Benjamin)
>
>

What version of CouchDB are you using and what is an actual request look like?

A recent check on trunk shows both decoders handle your case fine:

1> mochijson2:decode(<<"\"", 226,128,153, "\"">>).
<<226,128,153>>
2> ejson:decode(<<"\"", 226,128,153, "\"">>).
<<226,128,153>>
3>

Mime
View raw message