[ https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adam Kocoloski updated COUCHDB-345: ----------------------------------- Attachment: reject_invalid_utf8.patch Hi folks, here's a patch that rejects updates which are not valid UTF-8. I tested it against Joan's and Mark's Python scripts, but I don't have junit so I wasn't able to run Curt's test. > "High ASCII" can be inserted into db but not retrieved > ------------------------------------------------------ > > Key: COUCHDB-345 > URL: https://issues.apache.org/jira/browse/COUCHDB-345 > Project: CouchDB > Issue Type: Bug > Affects Versions: 0.9 > Environment: OSX 10.5.6 > Reporter: Joan Touzet > Attachments: badtext.tar.gz, enctest.zip, reject_invalid_utf8.patch > > > It is possible to PUT/POST a document into CouchDB with a "high ASCII" value that cannot be retrieved. This results from not escaping a non-ASCII value into \u#### when PUT/POSTing the document. > The attached sample code will recreate the problem using the hex value D8 (Ø) in a possibly unsavoury test string. > Sample output against 0.9.0 is as follows: > ================================================ > { > "ok": true > } > { > "id": "fail", > "ok": true, > "rev": "1-76726372" > } > { > "error": "ucs", > "reason": "{bad_utf8_character_code}" > } > ================================================ > Please note this defect turned up another problem, namely that the bad_utf8_character_code exception thrown by a design document attempting to map() the bad document caused Futon to fail silently in building the view, with no indication (except via debug log) that there was a failure. The log indicated two attempts to build the view, both failing, followed by an uncaught exception error for Futon. > Based on this, there are likely other areas in the codebase that do not handle the bad_utf8_character_code exception correctly. > My belief is that CouchDB shouldn't accept this input and should have rejected the PUT/POST, or should have escaped the input itself before the insertion. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.