couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Joseph Davis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (COUCHDB-1057) Wrong JSON parser behavior on escaped unicode characters
Date Thu, 03 Feb 2011 19:27:30 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990250#comment-12990250
] 

Paul Joseph Davis commented on COUCHDB-1057:
--------------------------------------------

Also, I realized I should probably give more background on this instead of just getting irritated
with that spec again.

The underlying issue is that CouchDB stores all of its JSON strings as UTF-8, which means
that all code points we recognize in the input is required to be representable as UTF-8. As
you see in the JSON spec, there wasn't much foresight into what constitutes a valid Unicode
code point. This means that the JSON spec allows for things that aren't representable as UTF-8
via unicode escapes.

When I asked about the issue on the es5-discuss list I was actually told that JSON requires
strings to be stored as 16 bit integers (hence why I'm so fond of repeating that). Yeah, I
was actually told that JSON supposedly requires a specific string implementation. Seeing as
how JSON is widely characterized as a ubiquitous exchange format, I promptly rejected that
assertion and haven't been overly motivated to relax our enforcement of valid Unicode code
points.

If someone wants to write a patch that carries invalid escapes through the system I'd probably
be ok with that, though I think we tried once and it gummed up something somewhere else.

> Wrong JSON parser behavior on escaped unicode characters
> --------------------------------------------------------
>
>                 Key: COUCHDB-1057
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1057
>             Project: CouchDB
>          Issue Type: Bug
>          Components: Database Core
>    Affects Versions: 1.0
>         Environment: Ubuntu 10.10
> Doesn't matter
>            Reporter: Fedor Indutny
>
> Try to save following doc to couchdb:
> { "_id" : "json-test", "test": "\u0080-\uffff"}
> And then put it to the database:
> curl -X PUT -d @1.json --basic --user admin:admin -H "Content-Type: application/json"
http://couchdb:5984/tadagraph/json-test
> You'll get error:
> {"error":"bad_request","reason":"invalid UTF-8 JSON"}
> jsonlint ( http://www.jsonlint.com/ ) says that it's a valid JSON

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message