couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <>
Subject Re: Locale and rule based view collation
Date Mon, 27 Sep 2010 00:43:49 GMT
On Sun, Sep 26, 2010 at 8:37 PM, Noah Diewald <> wrote:
> On Sat, Sep 25, 2010 at 6:38 PM, Paul Davis <> wrote:
>> On Sat, Sep 25, 2010 at 7:21 PM, Chris Anderson <> wrote:
>>> On Sat, Sep 18, 2010 at 4:47 PM, Noah Diewald <>
>>>> I was wondering if there were any plans to make use of more of the ICU
>>>> collation API in CouchDB.
>>>> I'm using CouchDB to make natural language documentation software and
>>>> it seems like a shame that I might have to use ICU for creating sort
>>>> keys to get sort orders right for view keys in certain languages when
>>>> ICU is already used internally by CouchDB. It kind of looks like
>>>> something could be added in at about the same place as the option for
>>>> case or no case collations in couch_icu_driver.c but I feel under
>>>> qualified to play around with it. I think that having an option in the
>>>> view to specify collation customization would be really great and it
>>>> must be something that even people working with less obscure languages
>>>> than I am could benefit from.
>>> we definitely plan to make this configurable, just a matter of writing
>>> code. for now there might be a way to set it on a per-server-instance
>>> basis with environment variables. I am no expert on the topic, but I
>>> vaguely recall someone mentioning this possibility.
>>> Chris
>>>> --
>>>> Noah Diewald
>>> --
>>> Chris Anderson
>> I'm pretty sure that Chris is right that there's a server wide
>> environment setting that affects ICU collation, but I can't say with
>> any certainty.
>> Its always been on the to-do list to provide the ability to have
>> language based sorts that are defined at the view or database level,
>> but as Chris points out, no one's gotten around to doing that.
>> Currently the major issues would revolve around recoding the
>> icu_driver to have smarts in how it's created, as well as refactoring
>> how we access the driver.
>> If we bumped our minimum Erlang VM version to R13, writing this as a
>> NIF would probably be orders of magnitude easier because of resource
>> types and what not.
>> Once those hard parts are figured out, exposing it to the outside
>> world should be as easy as going through the bike shedding motions on
>> what the _design/doc syntax would look like.
>> HTH,
>> Paul Davis
> It is great to know that this type of thing is on the todo list. If
> custom rules were supported and not just predefined locales, some of
> the questionable NIFs I'm writing to make sort keys in my application
> layer could be removed some day and life would be simpler.
> I don't think that the environment variables help me personally with
> supporting multiple languages with different sort orders, especially
> since the collation customizations for two of the languages that I'm
> focusing on require custom rules. It would be really awesome if
> CouchDB supported ICU custom collation rules in views right out of the
> box. It might go a long way to making CouchDB a favorite with
> linguists. (CouchDB should be a favorite with linguists anyway because
> it is such a pleasure to use but this could make it extra favorite.)
> Thank you both for the replies.
> --
> Noah Diewald

I'm not sure what you mean by custom rules. I'm not extremely familiar
with the collation API, but as I recall it had a thing that allowed a
user to pass a string based config to it that it would use to affect
the collation algorithm. Are you needing something beyond that?

Paul Davis

View raw message