Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8373039A0 for ; Thu, 28 Apr 2011 22:20:25 +0000 (UTC) Received: (qmail 80462 invoked by uid 500); 28 Apr 2011 22:20:24 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 80425 invoked by uid 500); 28 Apr 2011 22:20:23 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 80415 invoked by uid 99); 28 Apr 2011 22:20:23 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 22:20:23 +0000 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RFC_ABUSE_POST,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.220.180 as permitted sender) Received: from [209.85.220.180] (HELO mail-vx0-f180.google.com) (209.85.220.180) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Apr 2011 22:20:17 +0000 Received: by vxk12 with SMTP id 12so3615157vxk.11 for ; Thu, 28 Apr 2011 15:19:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=dgzd+4FfRxcI3mQRCwyMVb8XFb3eiOpglCrSFXR9Xa8=; b=CxSPS7Pt8/HUn4U5uUjcZqVLzne9YMuFIH9UxtqaGuBua2m5BFu5qocV5Vu697vTZd pD3I5LkDE3XxRLwMc1NnfRjIIWkzRrtHBeTndNnwDpT/BVVPYBHEMyziz1nq/rzcUjg7 IZpJAIcignr2cm6gb0Kdh6pshmO85ZTp8Xfq4= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=hF2K5CvMJ+OjN1nWvsbT68/LpHjY151WtKQCbzpg7y9W6rc8v8oz9z1n+qfDXYbKwT yYpL8LOcKR29ty0GmIHTBNCXtybNACCJw5z5t3GKhUd/IIIa94mffwgCSTME06CaerMZ rljwcjpq7KoId7TvBG1tcHGd+QbJspqgyyCXY= Received: by 10.52.173.176 with SMTP id bl16mr381564vdc.41.1304029196087; Thu, 28 Apr 2011 15:19:56 -0700 (PDT) MIME-Version: 1.0 Received: by 10.52.101.193 with HTTP; Thu, 28 Apr 2011 15:19:16 -0700 (PDT) In-Reply-To: References: <397688.72873.qm@web112116.mail.gq1.yahoo.com> <187E079B-E517-4127-A839-09B2AD37B0F6@vpro.nl> From: Paul Davis Date: Thu, 28 Apr 2011 18:19:16 -0400 Message-ID: Subject: Re: CouchDB View Unicode Document To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Apr 28, 2011 at 5:57 PM, Noah Diewald wrot= e: >> Can someone paste some actual input/output pairs so I have a clue >> what's going on. >> >> Theoretically \uFFFF isn't a valid escape sequence last I checked >> (don't get me started on 4627 idiocy). >> >> The JSON encoder will by default escape data that is non-printable >> ascii. The few special cased characters mentioned in the JSON spec are >> backslash escaped (\t \n \" etc) while All other bits are escaped as >> \uHHHH sequences. > > What you're describing is what I'm seeing. I don't think it is a bug, > just something I don't like because it isn't taking advantage of the > benefits of unicode. I'd rather see the characters instead of \uHHHH > sequences. For instance I get "\u00e9" for "=E9". I guess the JSON spec > says that any character can be escaped but characters in the basic > multilingual plane don't need to be because the string is utf8. I > guess I feel that the benefit of utf8 is supposed to be that escaping > these characters isn't necessary but that they'll appear in an easily > human readable form. I think from what you said above that I'm not > experiencing anything that is unexpected but I can supply some input > and output if it is. > > -- > Noah Diewald > noah.diewald.me > noahsarchive.net > You are exactly correct. I think the general fear with escaping UTF-8 is to make it easier for the JSON to pass through broken implementations that don't pay attention to possible UTF-8 in string data. It's possible to throw make that sort of thing configurable but that would entail quite a bit of consideration on a couple different fronts.