Return-Path: Delivered-To: apmail-incubator-couchdb-user-archive@locus.apache.org Received: (qmail 79324 invoked from network); 25 Oct 2008 15:45:59 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 25 Oct 2008 15:45:59 -0000 Received: (qmail 38656 invoked by uid 500); 25 Oct 2008 15:46:01 -0000 Delivered-To: apmail-incubator-couchdb-user-archive@incubator.apache.org Received: (qmail 38616 invoked by uid 500); 25 Oct 2008 15:46:01 -0000 Mailing-List: contact couchdb-user-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-user@incubator.apache.org Delivered-To: mailing list couchdb-user@incubator.apache.org Received: (qmail 38605 invoked by uid 99); 25 Oct 2008 15:46:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Oct 2008 08:46:01 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jchris@gmail.com designates 74.125.78.150 as permitted sender) Received: from [74.125.78.150] (HELO ey-out-1920.google.com) (74.125.78.150) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Oct 2008 15:44:47 +0000 Received: by ey-out-1920.google.com with SMTP id 4so727089eyg.54 for ; Sat, 25 Oct 2008 08:45:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:sender :to:subject:in-reply-to:mime-version:content-type :content-transfer-encoding:content-disposition:references :x-google-sender-auth; bh=xevNWM9GYslJxzLfzTm/QC7D9InNcxLFh/uPWAb+2gY=; b=KGeydv7AT/4FQo7I1aLmJbgakAYRFEjFVgmXTZKVGmKsKuJXEY9y/j/CviHnSpDCot 1nTvNkhCbjFx4O+5gIcID661CVxSXr21+dYExi09Kp3cw+dcVldNRh3FLCRplpAXKpfg z387k/NrnaucTCKkvS1zrX1BKhAhN0A7ZhuHU= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:sender:to:subject:in-reply-to:mime-version :content-type:content-transfer-encoding:content-disposition :references:x-google-sender-auth; b=Far1Ly+2OkY0oV7q6ME6EZjv6FmFfGwSNAao23LQsmo86k59Lgk1nm6G8dZrWCwHZ/ qmC+4INl23D6jS7yCrqHn7IKgW1VzYGfBnOFO3g/bOGUO8iqYGOLeMQ6iwvPzN0eIMm8 GPdH/pAtp3YXUsOTHMvtYqIcD7T9iN7KQnqHs= Received: by 10.210.71.11 with SMTP id t11mr1103920eba.57.1224949516249; Sat, 25 Oct 2008 08:45:16 -0700 (PDT) Received: by 10.210.54.17 with HTTP; Sat, 25 Oct 2008 08:45:16 -0700 (PDT) Message-ID: Date: Sat, 25 Oct 2008 08:45:16 -0700 From: "Chris Anderson" Sender: jchris@gmail.com To: couchdb-user@incubator.apache.org Subject: Re: UTF-8 Support? In-Reply-To: <4902CECB.6080200@isshen.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline References: <4902CECB.6080200@isshen.com> X-Google-Sender-Auth: bd9b8316d6c297a8 X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Oct 25, 2008 at 12:46 AM, Ho-Sheng Hsiao wrote: > > I don't know which record it is barfing on. Pulling a single record out: > > { > "unihan_version": "5.1.0", > "unihan": { > "kIRG_GSource":"HZ", > "kOtherNumeric":"7", > "kIRGHanyuDaZidian":"10004.020", > "kDefinition":"the original form for \u4e03 U+4E03", > "kCihaiT":"10.601", > "kPhonetic":"1635", > "kMandarin":"QI1", > "kCantonese":"cat1", > "kRSKangXi":"1.1", > "kHanYu":"10004.020", > "kRSUnicode":"1.1", > "kIRGKangXi":"0076.021"}, > "_id":"U+20001" > } > } > > Seems to work fine even with the bulk uploader. > > I'm going to attempt to insert the records one by one. Maybe I can find > out which record it is barfing on, maybe the json was invalid. It seems > to me though, that something is barfing on utf8 on bulk uploads over a > certain limit. > > If someone wants to try it out, I can supply the json file I used. Any > help is appreciated. If you don't mind, I'll take a look at it. The error you showed sure looks like a utf8 error, but with such a big bulk upload it's hard to be sure. Perhaps you can put the Unihan-5.1.0.json file online somewhere, or if you have it boiled down to records that are causing the problem, singling those out would of course be helpful. Thanks, Chris -- Chris Anderson http://jchris.mfdz.com