couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ho-Sheng Hsiao <>
Subject UTF-8 Support?
Date Sat, 25 Oct 2008 07:46:19 GMT

Hey all,

I'm trying to load the Unihan database into CouchDB (extracted from the
Unicode specification). Parts of it requires passing utf-8 characters,
which according to the JSON specification requires escaping to \uxxxx

Since the initial load has around 71,000 records, I'm using bulk
uploading via:

curl -X POST http://localhost:5984/unihan/_bulk_docs -H "Content-Type:
application/json; charset=utf-8" -d @data/Unihan-5.1.0.json

However, I would run into this error:

[info] [<0.62.0>] HTTP Error (code 500): {'EXIT',

This error occurred on a recent trunk version as well as the 0.8.1
tarball (sorry, I don't remember the SVN rev number of the version I
used). I had attempted to use the latest trunk version (r707821), but
since that did not even compile, I couldn't try it.

I don't know which record it is barfing on. Pulling a single record out:

  "unihan_version": "5.1.0",
  "unihan": {
    "kDefinition":"the original form for \u4e03 U+4E03",

Seems to work fine even with the bulk uploader.

I'm going to attempt to insert the records one by one. Maybe I can find
out which record it is barfing on, maybe the json was invalid. It seems
to me though, that something is barfing on utf8 on bulk uploads over a
certain limit.

If someone wants to try it out, I can supply the json file I used. Any
help is appreciated.

Ho-Sheng Hsiao, VP of Engineering
Isshen Solutions, Inc.
(334) 559-9153

View raw message