couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: API suggestions
Date Mon, 29 Dec 2008 04:16:50 GMT

On 29/12/2008, at 2:15 PM, Chris Anderson wrote:

>> Especially once CouchDB handles Unicode
>> collation properly.
>
> I wasn't aware there was a problem with CouchDB's unicode collation.
> Is there a ticket you can point me to?

No, I haven't raised it. The issue is that collation cannot be  
specified per db, which IMO it needs to be, and I haven't seen  
anything in the code that does anything wrt collation i.e. I suspect  
it simply relies on the OS locale and icu's default handling. I  
haven't thought about it enough to know whether persisted strings  
should be stored in a normalized form, but certainly comparison needs  
to use both normalisation and a specified collation order.

It also affects what end-of-collation-order character one uses for  
prefix key searching, and would affect the computation of  
succ(string). That issue alone leads me to think that CouchDB needs to  
do more in that area because it's quite difficult to fix in the  
client, whereas CouchDB is already fully unicode with icu. As an  
example, I think the key boundary testing API could be richer,  
eliminating the need for the current key hacks, especially the use of  
a high-numeric-value unicode character for prefix ranges.

As I say, I haven't thought enough about it to raise a ticket, but I  
feel strongly that it needs to be dealt with, and I suspect it's more  
obvious to me because I'm deploying for an Asian/Arabic-script  
localised environment.

Antony Blakey
-------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

One should respect public opinion insofar as is necessary to avoid  
starvation and keep out of prison, but anything that goes beyond this  
is voluntary submission to an unnecessary tyranny.
   -- Bertrand Russell



Mime
View raw message