Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 88123 invoked from network); 25 Sep 2010 23:39:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 25 Sep 2010 23:39:37 -0000 Received: (qmail 6465 invoked by uid 500); 25 Sep 2010 23:39:35 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 6399 invoked by uid 500); 25 Sep 2010 23:39:35 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 6391 invoked by uid 99); 25 Sep 2010 23:39:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Sep 2010 23:39:35 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.214.180 as permitted sender) Received: from [209.85.214.180] (HELO mail-iw0-f180.google.com) (209.85.214.180) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 25 Sep 2010 23:39:30 +0000 Received: by iwn8 with SMTP id 8so4736020iwn.11 for ; Sat, 25 Sep 2010 16:39:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=VDSm184sl4qxndchnl6qT8vMF9Y3be33zS5q/DVcYho=; b=oIPuNi0PUKBDfVRICK7SU8Wa0nQGNaIUpRO+JBH9zJbGwmt5KjnSaAhGXk67v5CikS Yl/Gb660jV+EKMTbTq4UZMdSIbYbUC33FTnKzRGCFtLk3avupyXCUt3HgtuDyxCnZ4Kc QRBogOX40BvZZPSo8g7GkiBD9Csxh3c34lpw0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=KgIb91PTo8zHN4E+lfpIg77RLhg9F4fm03hy3puW8Ye2ufPoXkt2TmzZxaSlxuawZC tjBl2EhSHxe8+c0dFJ10fXYeDBrNvRGQHWGm/tQEsArzFSvPdpdVVnFHMdKo1vpAICUJ 1gzyurXXwbhInLweWd8H620meBpbvBNaOsAvM= Received: by 10.231.183.134 with SMTP id cg6mr6294123ibb.197.1285457949915; Sat, 25 Sep 2010 16:39:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.30.194 with HTTP; Sat, 25 Sep 2010 16:38:29 -0700 (PDT) In-Reply-To: References: From: Paul Davis Date: Sat, 25 Sep 2010 19:38:29 -0400 Message-ID: Subject: Re: Locale and rule based view collation To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 On Sat, Sep 25, 2010 at 7:21 PM, Chris Anderson wrote: > On Sat, Sep 18, 2010 at 4:47 PM, Noah Diewald wrote: >> I was wondering if there were any plans to make use of more of the ICU >> collation API in CouchDB. >> >> I'm using CouchDB to make natural language documentation software and >> it seems like a shame that I might have to use ICU for creating sort >> keys to get sort orders right for view keys in certain languages when >> ICU is already used internally by CouchDB. It kind of looks like >> something could be added in at about the same place as the option for >> case or no case collations in couch_icu_driver.c but I feel under >> qualified to play around with it. I think that having an option in the >> view to specify collation customization would be really great and it >> must be something that even people working with less obscure languages >> than I am could benefit from. >> > > we definitely plan to make this configurable, just a matter of writing > code. for now there might be a way to set it on a per-server-instance > basis with environment variables. I am no expert on the topic, but I > vaguely recall someone mentioning this possibility. > > Chris > >> -- >> Noah Diewald >> > > > > -- > Chris Anderson > http://jchrisa.net > http://couch.io > I'm pretty sure that Chris is right that there's a server wide environment setting that affects ICU collation, but I can't say with any certainty. Its always been on the to-do list to provide the ability to have language based sorts that are defined at the view or database level, but as Chris points out, no one's gotten around to doing that. Currently the major issues would revolve around recoding the icu_driver to have smarts in how it's created, as well as refactoring how we access the driver. If we bumped our minimum Erlang VM version to R13, writing this as a NIF would probably be orders of magnitude easier because of resource types and what not. Once those hard parts are figured out, exposing it to the outside world should be as easy as going through the bike shedding motions on what the _design/doc syntax would look like. HTH, Paul Davis