incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noah Diewald <noah.diew...@gmail.com>
Subject Re: Locale and rule based view collation
Date Mon, 27 Sep 2010 00:37:29 GMT
On Sat, Sep 25, 2010 at 6:38 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> On Sat, Sep 25, 2010 at 7:21 PM, Chris Anderson <jchris@apache.org> wrote:
>> On Sat, Sep 18, 2010 at 4:47 PM, Noah Diewald <noah.diewald@gmail.com> wrote:
>>> I was wondering if there were any plans to make use of more of the ICU
>>> collation API in CouchDB.
>>>
>>> I'm using CouchDB to make natural language documentation software and
>>> it seems like a shame that I might have to use ICU for creating sort
>>> keys to get sort orders right for view keys in certain languages when
>>> ICU is already used internally by CouchDB. It kind of looks like
>>> something could be added in at about the same place as the option for
>>> case or no case collations in couch_icu_driver.c but I feel under
>>> qualified to play around with it. I think that having an option in the
>>> view to specify collation customization would be really great and it
>>> must be something that even people working with less obscure languages
>>> than I am could benefit from.
>>>
>>
>> we definitely plan to make this configurable, just a matter of writing
>> code. for now there might be a way to set it on a per-server-instance
>> basis with environment variables. I am no expert on the topic, but I
>> vaguely recall someone mentioning this possibility.
>>
>> Chris
>>
>>> --
>>> Noah Diewald
>>>
>>
>>
>>
>> --
>> Chris Anderson
>> http://jchrisa.net
>> http://couch.io
>>
>
> I'm pretty sure that Chris is right that there's a server wide
> environment setting that affects ICU collation, but I can't say with
> any certainty.
>
> Its always been on the to-do list to provide the ability to have
> language based sorts that are defined at the view or database level,
> but as Chris points out, no one's gotten around to doing that.
> Currently the major issues would revolve around recoding the
> icu_driver to have smarts in how it's created, as well as refactoring
> how we access the driver.
>
> If we bumped our minimum Erlang VM version to R13, writing this as a
> NIF would probably be orders of magnitude easier because of resource
> types and what not.
>
> Once those hard parts are figured out, exposing it to the outside
> world should be as easy as going through the bike shedding motions on
> what the _design/doc syntax would look like.
>
> HTH,
> Paul Davis
>

It is great to know that this type of thing is on the todo list. If
custom rules were supported and not just predefined locales, some of
the questionable NIFs I'm writing to make sort keys in my application
layer could be removed some day and life would be simpler.

I don't think that the environment variables help me personally with
supporting multiple languages with different sort orders, especially
since the collation customizations for two of the languages that I'm
focusing on require custom rules. It would be really awesome if
CouchDB supported ICU custom collation rules in views right out of the
box. It might go a long way to making CouchDB a favorite with
linguists. (CouchDB should be a favorite with linguists anyway because
it is such a pleasure to use but this could make it extra favorite.)

Thank you both for the replies.

-- 
Noah Diewald

Mime
View raw message