couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Noah Slater <nsla...@apache.org>
Subject Re: Unicode normalization (was Re: The 1.0 Thread)
Date Mon, 22 Jun 2009 14:36:37 GMT
On Sun, Jun 21, 2009 at 11:21:00PM -0700, Chris Anderson wrote:
> My gut reaction is that normalizing strings using NFC [1] is not appropriate
> for a database. Here's why we should treat strings as binary and not worry
> about unicode normalization at all:
[...]
> First of all, I'm certain we can't require that all input already be NFC
> normalized.
[...]
> Secondly, we're a database, so I find highly suspicious the notion that we
> should auto-normalize user input on-the-quiet.
[...]
> So we can't require normalized input and we can't auto-normalize.

CouchDB would create a canonicalised copy of the document while creating the
document hash. There is no reason why CouchDB, or the clients, should worry
about canonicalising the actual documents.

> Where does this leave us?

Canonicalisation is a temporary step, so there are no problems.

> > Unicode normalisation is an issue for clients because it requires they have
> > access to a Unicode NFC function.

Why would clients need to worry about this? CouchDB is creating the hashes.

Best,

-- 
Noah Slater, http://tumbolia.org/nslater

Mime
View raw message