incubator-couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COUCHDB-1425) Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
Date Mon, 04 Mar 2013 20:09:13 GMT

    [ https://issues.apache.org/jira/browse/COUCHDB-1425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13592578#comment-13592578
] 

ASF subversion and git services commented on COUCHDB-1425:
----------------------------------------------------------

Commit 254d9d5830000181eeb25f7533256ba4d5740b39 in branch refs/heads/1425-fix-graceful-surrogate-handling
from [~janl]
[ https://git-wip-us.apache.org/repos/asf?p=couchdb.git;h=254d9d5 ]

Handle invalid UTF-8 byte sequences gracefully by replacing them with 0xFFFD

CouchDB's Erlang JSON parser allows storing of invalid UTF-8 byte sequences.
The Query Server inside CouchDB fails upon necountering these byte sequences.
The view process fails for the current batch of document updates. The result
is that the view is invariably broken. Only removing the document in question
solves this otherwise, but finding that is hard as the `log()` inside the
Query Server dies with the invalid byte sequence because our protocol is
synchronous and map results an `log()` messages generated therein are
submitted together.

This patch replaces invalid bytes with the the surrogate chacracter 0xFFFD.

Closes COUCHDB-1425.

Patch by Sam Rijs <recv@awesan.de> and Paul Davis.

Eventually, this should be fixed at the HTTP level, so that no documents
with invalid byte sequences can be written to CouchDB. The jiffy encoder
we'll get with BigCouch will do that for us. This is a fix for the releases
until then.

                
> Emitting UTF-8 chars >= 0xD800 in JS map stops design doc from indexing
> -----------------------------------------------------------------------
>
>                 Key: COUCHDB-1425
>                 URL: https://issues.apache.org/jira/browse/COUCHDB-1425
>             Project: CouchDB
>          Issue Type: Bug
>          Components: JavaScript View Server
>    Affects Versions: 1.1.1
>         Environment: Mac OS 10.6.8, but not sure that matters.
>            Reporter: Jim Klo
>         Attachments: utf8.c.diff
>
>
> Was trying determine UTF-8 Char collation, using the following Gist: https://gist.github.com/1904807
> It turns out that once the view gets to the document that would emit "\uD800", the view
server times out and stops indexing that design document.
> This seems like a bug, since I can 'store' a document with UTF-8 chars >= 0xD800,
but one cannot emit a key with that char in the string.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message