couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <>
Subject Re: Unicode normalization (was Re: The 1.0 Thread)
Date Tue, 23 Jun 2009 22:51:12 GMT

On 24/06/2009, at 1:57 AM, Paul Davis wrote:

> Are there byte order semantics for UTF-8?

No, UTF-8 is independent of byte ordering because it's a byte stream.

> Or other cases where sorting
> by UTF-8 binary representation is going to cause issues? Remember that
> the end goal is to create deterministic serializations for hashing.

Sorting over the UTF-8 bytes is fine for this.

> Sorting by code point doesn't seem like it'd get us anything other
> than added complexity.

Agreed, because you would have to deal with surrogates in UTF-16.

> Patches welcome.

Of course. Let me qualify by saying I have no time to do this, I'm  
merely taking part in a discussion.

Antony Blakey
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

When I hear somebody sigh, 'Life is hard,' I am always tempted to ask,  
'Compared to what?'
   -- Sydney Harris

View raw message