Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 51838 invoked from network); 27 Sep 2010 00:37:59 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 27 Sep 2010 00:37:59 -0000 Received: (qmail 44135 invoked by uid 500); 27 Sep 2010 00:37:58 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 43932 invoked by uid 500); 27 Sep 2010 00:37:57 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 43923 invoked by uid 99); 27 Sep 2010 00:37:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 00:37:57 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of noah.bluejay@gmail.com designates 209.85.214.52 as permitted sender) Received: from [209.85.214.52] (HELO mail-bw0-f52.google.com) (209.85.214.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Sep 2010 00:37:51 +0000 Received: by bwz3 with SMTP id 3so4424489bwz.11 for ; Sun, 26 Sep 2010 17:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:sender:received :in-reply-to:references:date:x-google-sender-auth:message-id:subject :from:to:content-type; bh=cHaZYlqAZ5UW6HJ1fA5w0FJfAZkHN6XXk3gsDFvIEeQ=; b=Q27CgAmUgGgvO6/l6W+a5zLVdOUHHW/beID66TRrEQXontz8ZC3/rtbPhuiZ62x0Oh xAcHW8qICLdweC/Xn3aWKToU5aWoRnDEHEdV5VBcnXDjY4tbmgC/mXfT11DSbQu50qST 4ZGaC3QwFNk7i2aWje3FVqglfwf/MmsqR0a2M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:content-type; b=NjbyI4lGVuIv/ycWyPCka26benuCVVp4hdK6rmi0FfFjnej9joTnVL6WRkmlAIvkkX j7gMkh/Mp83Wim11iAnteDZhzHbt+ukWAy4pYO9u92HdEBEIjrHzqOf8kpswpUoDUK9t Y8f+DAEaB1bALpgM71itzmNhSsnBgVuUpHYqU= MIME-Version: 1.0 Received: by 10.204.76.140 with SMTP id c12mr4761233bkk.7.1285547849678; Sun, 26 Sep 2010 17:37:29 -0700 (PDT) Sender: noah.bluejay@gmail.com Received: by 10.204.19.194 with HTTP; Sun, 26 Sep 2010 17:37:29 -0700 (PDT) In-Reply-To: References: Date: Sun, 26 Sep 2010 19:37:29 -0500 X-Google-Sender-Auth: wAVLVH3vcR7NUL2AL1lhZnXlxFg Message-ID: Subject: Re: Locale and rule based view collation From: Noah Diewald To: user Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org On Sat, Sep 25, 2010 at 6:38 PM, Paul Davis wrote: > On Sat, Sep 25, 2010 at 7:21 PM, Chris Anderson wrote: >> On Sat, Sep 18, 2010 at 4:47 PM, Noah Diewald wrote: >>> I was wondering if there were any plans to make use of more of the ICU >>> collation API in CouchDB. >>> >>> I'm using CouchDB to make natural language documentation software and >>> it seems like a shame that I might have to use ICU for creating sort >>> keys to get sort orders right for view keys in certain languages when >>> ICU is already used internally by CouchDB. It kind of looks like >>> something could be added in at about the same place as the option for >>> case or no case collations in couch_icu_driver.c but I feel under >>> qualified to play around with it. I think that having an option in the >>> view to specify collation customization would be really great and it >>> must be something that even people working with less obscure languages >>> than I am could benefit from. >>> >> >> we definitely plan to make this configurable, just a matter of writing >> code. for now there might be a way to set it on a per-server-instance >> basis with environment variables. I am no expert on the topic, but I >> vaguely recall someone mentioning this possibility. >> >> Chris >> >>> -- >>> Noah Diewald >>> >> >> >> >> -- >> Chris Anderson >> http://jchrisa.net >> http://couch.io >> > > I'm pretty sure that Chris is right that there's a server wide > environment setting that affects ICU collation, but I can't say with > any certainty. > > Its always been on the to-do list to provide the ability to have > language based sorts that are defined at the view or database level, > but as Chris points out, no one's gotten around to doing that. > Currently the major issues would revolve around recoding the > icu_driver to have smarts in how it's created, as well as refactoring > how we access the driver. > > If we bumped our minimum Erlang VM version to R13, writing this as a > NIF would probably be orders of magnitude easier because of resource > types and what not. > > Once those hard parts are figured out, exposing it to the outside > world should be as easy as going through the bike shedding motions on > what the _design/doc syntax would look like. > > HTH, > Paul Davis > It is great to know that this type of thing is on the todo list. If custom rules were supported and not just predefined locales, some of the questionable NIFs I'm writing to make sort keys in my application layer could be removed some day and life would be simpler. I don't think that the environment variables help me personally with supporting multiple languages with different sort orders, especially since the collation customizations for two of the languages that I'm focusing on require custom rules. It would be really awesome if CouchDB supported ICU custom collation rules in views right out of the box. It might go a long way to making CouchDB a favorite with linguists. (CouchDB should be a favorite with linguists anyway because it is such a pleasure to use but this could make it extra favorite.) Thank you both for the replies. -- Noah Diewald