Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D32E9478D for ; Wed, 8 Jun 2011 16:32:19 +0000 (UTC) Received: (qmail 33041 invoked by uid 500); 8 Jun 2011 16:32:18 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 32951 invoked by uid 500); 8 Jun 2011 16:32:17 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 32943 invoked by uid 99); 8 Jun 2011 16:32:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jun 2011 16:32:17 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [173.201.192.238] (HELO p3plsmtpa07-09.prod.phx3.secureserver.net) (173.201.192.238) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 08 Jun 2011 16:32:09 +0000 Received: (qmail 31630 invoked from network); 8 Jun 2011 16:31:47 -0000 Received: from unknown (184.146.169.181) by p3plsmtpa07-09.prod.phx3.secureserver.net (173.201.192.238) with ESMTP; 08 Jun 2011 16:31:46 -0000 Date: Wed, 8 Jun 2011 12:32:00 -0400 From: MK To: user@couchdb.apache.org Subject: when will utf8 handling be fixed? Message-Id: <20110608123200.dd4dd230.mk@cognitivedissonance.ca> X-Mailer: Sylpheed 3.1.1 (GTK+ 2.22.0; x86_64-redhat-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Is there any intention to fix couch's handling of "unusual" unicode characters? One of the "unusual" characters is the right single quote (226,128,153) which is a valid utf8 character and also not very "unusual" IMO. I have an interface which allows users to add and edit text in a db document (again, not very unusual) and this one came up because of someone cutting and pasting some text from a source which used the right single quote as an apostrophe (which is just plain common -- in fact they are used in the online "Definitive Guide"). So I am having to maintain a switch statement which filters out these characters and replaces them with html entities before they get sent to couch, which is okay in my case since the documents are just being used as html pages anyway. But it's an awkward and unnecessary solution: individual developers should not have to be dealing with this, proper utf8 handling should be hard coded into couch. For one thing, it means that anyone worried about such "unusual" possibilities cannot use couchapp or couch directly -- data has to be filtered first server side. Although spidermonkey handles utf8 fine, depending on client side filtering is not always an alternative. Sincerely, MK -- "Enthusiasm is not the enemy of the intellect." (said of Irving Howe) "The angel of history[...]is turned toward the past." (Walter Benjamin)