couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Curt Arnold (JIRA)" <>
Subject [jira] Commented: (COUCHDB-345) "High ASCII" can be inserted into db but not retrieved
Date Sat, 29 Aug 2009 14:15:33 GMT


Curt Arnold commented on COUCHDB-345:

The patch does result in passing the tests as they were intended to be (asserting that the
PUT returns a 400).  The versions that were attached were demonstrating the failure of the
GET after the PUT so they didn't assert the 400 return.  The patch hunk for couch_httpd.erl
is a little stale and needs to be manually applied.

The patch modifies mochijson2, so it puts us in the position of diverging from stock MochiWeb.

My thought was to put a call to unicode:characters_to_binary(Bin,utf8,utf8) in PUT code path.
 If the source Bin is valid UTF-8, the return value will be identical.  If not, then it returns
{ error, "Valid characters", <<MalformedStuff>> }.  Support for the UTF-16's could
be done at the same place. and
mention that the implementation is complete as documented in R13A, but I don't know how much
if any of the unicode module is present in R12B5.  Mochiweb references xmerl_ucs, which isn't
in the docs but is apparently the ucs string support for the XML parser.

I'd suggest implementing a check/conversion on the PUT code path using the unicode module
and then adapting it to run on our minimum platform if that is an issue.

> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>                 Key: COUCHDB-345
>                 URL:
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 0.9
>         Environment: OSX 10.5.6
>            Reporter: Joan Touzet
>         Attachments: badtext.tar.gz,, reject_invalid_utf8.patch
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value that cannot
be retrieved. This results from not escaping a non-ASCII value into \u#### when PUT/POSTing
the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø) in a possibly
unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
>     "ok": true
> }
> {
>     "id": "fail", 
>     "ok": true, 
>     "rev": "1-76726372"
> }
> {
>     "error": "ucs", 
>     "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the bad_utf8_character_code
exception thrown by a design document attempting to map() the bad document caused Futon to
fail silently in building the view, with no indication (except via debug log) that there was
a failure. The log indicated two attempts to build the view, both failing, followed by an
uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not handle the bad_utf8_character_code
exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have rejected the PUT/POST,
or should have escaped the input itself before the insertion.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message