couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Curt Arnold (JIRA)" <j...@apache.org>
Subject [jira] Updated: (COUCHDB-345) "High ASCII" can be inserted into db but not retrieved
Date Fri, 28 Aug 2009 23:38:32 GMT

     [ https://issues.apache.org/jira/browse/COUCHDB-345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Curt Arnold updated COUCHDB-345:
--------------------------------

    Attachment: enctest.zip

This is a JUnit 4 test case (with corresponding pom.xml) that demonstrates the current broken
behavior (or at least of about a week or so ago).

Documents that are not valid UTF-8 are accepted into the database, but can not be retrieved.
 I did not test if they broken queries, but have no reason to doubt that misencoded documents
would cause unexpected behavior in the database.  It would seem plausible that an attacker
could seriously damage a CouchDB application by inserting misencoded documents.  Depending
on an API layer to not send misencoded documents would still leave the DB vulnerable to an
intentional attack or a miscoded API layer.

The test creates http://localhost:5984/testdb and then tries to insert 5 documents.  The first
is just straight ASCII, the second inserts a document containing \u00C0 - \u00C6 encoded in
UTF-8 and the 3rd inserts the same document, but with the characters escaped instead of UTF-8
encoded.  These three behavior as expected.

The next two attempt to insert the same characters, but instead of UTF-8 encoded, they are
ISO-8859-1 encoded (that is the byte sequence 0xC0, 0xC1, 0xC2 ... is in the body).  One attempt
is with an Content-Encoding=ISO-8559-1 and the other without.   Both PUT returns with an 201
response, but an attempt to fetch results in a 500 due with an encoding error stack trace.
 Returning a 400 in both cases would be appropriate since the RFC says that JSON is always
UTF-8 encoded.



> "High ASCII" can be inserted into db but not retrieved
> ------------------------------------------------------
>
>                 Key: COUCHDB-345
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-345
>             Project: CouchDB
>          Issue Type: Bug
>    Affects Versions: 0.9
>         Environment: OSX 10.5.6
>            Reporter: Joan Touzet
>         Attachments: badtext.tar.gz, enctest.zip
>
>
> It is possible to PUT/POST a document into CouchDB with a "high ASCII" value that cannot
be retrieved. This results from not escaping a non-ASCII value into \u#### when PUT/POSTing
the document.
> The attached sample code will recreate the problem using the hex value D8 (Ø) in a possibly
unsavoury test string.
> Sample output against 0.9.0 is as follows:
> ================================================
> {
>     "ok": true
> }
> {
>     "id": "fail", 
>     "ok": true, 
>     "rev": "1-76726372"
> }
> {
>     "error": "ucs", 
>     "reason": "{bad_utf8_character_code}"
> }
> ================================================
> Please note this defect turned up another problem, namely that the bad_utf8_character_code
exception thrown by a design document attempting to map() the bad document caused Futon to
fail silently in building the view, with no indication (except via debug log) that there was
a failure. The log indicated two attempts to build the view, both failing, followed by an
uncaught exception error for Futon.
> Based on this, there are likely other areas in the codebase that do not handle the bad_utf8_character_code
exception correctly.
> My belief is that CouchDB shouldn't accept this input and should have rejected the PUT/POST,
or should have escaped the input itself before the insertion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message