couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noah Slater <nsla...@apache.org>
Subject Re: bad utf8 character
Date Fri, 13 Aug 2010 12:10:16 GMT

On 13 Aug 2010, at 01:37, Kenneth Tyler wrote:

>> the tests can be a bit finicky. I'd say a 99% success rate means you have nothing
to worry about.
> 
> ok, thanks
> 
> looks like i'll just have to try and avoid documents with the unknown
> "bad" text in them when i import

Force a UTF-8 encoding in your code before sending to CouchDB. If you think the data you have
is already in UTF-8, then decode it into a native Unicode representation, and then round-trip
it back to UTF-8 encoding. Your tools should have an option to replace unknown byte sequences
with a character of your choice.


Mime
View raw message