incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@apache.org>
Subject Re: UTF-8 Support?
Date Sat, 25 Oct 2008 15:45:16 GMT
On Sat, Oct 25, 2008 at 12:46 AM, Ho-Sheng Hsiao <hosh@isshen.com> wrote:
>
> I don't know which record it is barfing on. Pulling a single record out:
>
> {
>  "unihan_version": "5.1.0",
>  "unihan": {
>    "kIRG_GSource":"HZ",
>    "kOtherNumeric":"7",
>    "kIRGHanyuDaZidian":"10004.020",
>    "kDefinition":"the original form for \u4e03 U+4E03",
>    "kCihaiT":"10.601",
>    "kPhonetic":"1635",
>    "kMandarin":"QI1",
>    "kCantonese":"cat1",
>    "kRSKangXi":"1.1",
>    "kHanYu":"10004.020",
>    "kRSUnicode":"1.1",
>    "kIRGKangXi":"0076.021"},
>    "_id":"U+20001"
>  }
> }
>
> Seems to work fine even with the bulk uploader.
>
> I'm going to attempt to insert the records one by one. Maybe I can find
> out which record it is barfing on, maybe the json was invalid. It seems
> to me though, that something is barfing on utf8 on bulk uploads over a
> certain limit.
>
> If someone wants to try it out, I can supply the json file I used. Any
> help is appreciated.

If you don't mind, I'll take a look at it. The error you showed sure
looks like a utf8 error, but with such a big bulk upload it's hard to
be sure.

Perhaps you can put the Unihan-5.1.0.json file online somewhere, or if
you have it boiled down to records that are causing the problem,
singling those out would of course be helpful.

Thanks,
Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message