couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adam Kocoloski <kocol...@apache.org>
Subject what to do about invalid UTF-8 in saved documents?
Date Tue, 31 Aug 2010 05:25:39 GMT
It turns out that mochijson2 will incorrectly decode an invalid UTF-8 string if the illegal
byte sequence in the string occurs after an escaped character (COUCHDB-875).  This means that
one can store documents which will never be successfully retrieved or indexed in CouchDB 1.0.
 Moreover, once one of these documents makes it into the DB a view build on that DB will never
complete.

I wonder what we should do to circumvent that problem?  At the very least it might make sense
for the view indexer to skip documents which contain invalid UTF-8.

Adam


Mime
View raw message