couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Klo <jim....@sri.com>
Subject Re: when will utf8 handling be fixed?
Date Wed, 08 Jun 2011 19:28:03 GMT
One problem that often bites me - someone forgets to include the UTF-8 charset in the Content-Type
header.  Missing that can often mangle the handling of high byte characters.

When setting your Content-Type with curl this is often done something like:

curl -H "Content-Type: application/json; charset=utf-8" .... 

Jim Klo
Senior Software Engineer
Center for Software Engineering
SRI International




On Jun 8, 2011, at 9:35 AM, Paul Davis wrote:

> On Wed, Jun 8, 2011 at 12:32 PM, MK <mk@cognitivedissonance.ca> wrote:
>> Is there any intention to fix couch's handling of "unusual" unicode
>> characters?  One of the "unusual" characters is the right single quote
>> (226,128,153) which is a valid utf8 character and also not very
>> "unusual" IMO.
>> 
>> I have an interface which allows users to add and edit text in a db
>> document (again, not very unusual) and this one came up because of
>> someone cutting and pasting some text from a source which used the
>> right single quote as an apostrophe (which is just plain common -- in
>> fact they are used in the online "Definitive Guide").
>> 
>> So I am having to maintain a switch statement which filters out these
>> characters and replaces them with html entities before they get sent
>> to couch, which is okay in my case since the documents are just being
>> used as html pages anyway.
>> 
>> But it's an awkward and unnecessary solution: individual
>> developers should not have to be dealing with this, proper utf8
>> handling should be hard coded into couch.   For one thing, it means that
>> anyone worried about such "unusual" possibilities cannot use
>> couchapp or couch directly -- data has to be filtered first server side.
>> Although spidermonkey handles utf8 fine, depending on client side
>> filtering is not always an alternative.
>> 
>> Sincerely, MK
>> 
>> --
>> "Enthusiasm is not the enemy of the intellect." (said of Irving Howe)
>> "The angel of history[...]is turned toward the past." (Walter Benjamin)
>> 
>> 
> 
> What version of CouchDB are you using and what is an actual request look like?
> 
> A recent check on trunk shows both decoders handle your case fine:
> 
> 1> mochijson2:decode(<<"\"", 226,128,153, "\"">>).
> <<226,128,153>>
> 2> ejson:decode(<<"\"", 226,128,153, "\"">>).
> <<226,128,153>>
> 3>


Mime
View raw message