couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Unique instance IDs?
Date Tue, 13 Dec 2011 01:12:10 GMT
On Mon, Dec 12, 2011 at 6:30 PM, Jason Smith <jhs@iriscouch.com> wrote:
> On Tue, Dec 13, 2011 at 12:01 AM, Paul Davis
> <paul.joseph.davis@gmail.com> wrote:
>> I think you've contradicted yourself. If a URL is the universal name
>> for a database, then how are we able to server different databases
>> from the same URL?
>
> The contradiction was intentional. URLs are usually but not always
> stable. To re-invent a "universal identifier" for a "resource" sounds
> futile, IMHO.
>

I'm going to be pedantic here a bit. And this is on purpose and I've
warned people. But I'll hopefully tie it back to reality.

First, URL stands for "uniform resource locator" where URI stands for
"uniform resource identifier". There's a very very important
distinction between these two things.

In CouchDB land we allow sharing of URL's between to resources (ie,
delete db, create db) are two logically distinct databases until a
replication pulls the new one into the "replicated web of dbs" where
things get a bit more subtle.

But this recycling of URLs I think firmly places them outside the
scope of URIs in this instance. If we had URLs of the form
"http://host:port/$UUID" then I would argue in favor of the "URL is
URI" or perhaps "URL contains URI" type of approach. Though, we don't.
The URL is merely an alias to the actual database resource.

>> Tying a database to a URL is merely an artificial limitation because
>> we haven't thought of anything better. If we *did* think of a way to
>> uniquely identify databases that didn't break due to ops requirements
>> then that would be a much better fit to the CouchDB model. It is
>> difficult but that's because we haven't yet thought of a good way to
>> deal with what happens OOB when ops teams change server
>> configurations.
>
> All great points. My personal candidate is to look at the rsync
> protocol. From a foggy memory:
>
> * No names, no ids. It's always comparing data against data.
> * Both sides do normal and rolling checksums (perhaps memorized or
> incremental map/reduced for couch)
> * Always 1 round-trip. Receiver sends its checksums to sender, sender
> sends back the updates.

I'm no sure I see your point. Rsync is a significantly different
protocol than CouchDB replication. In fact, the entire point of
CouchDB's replication checkpoints is to avoid rolling checksum style
replication. Ie, replication in the extreme best case is O(1) for
CouchDB. Rsync would have to be at least O(N+M) to do checksums of
both for the comparison.

> People is interested in CouchDB might find this OLS talk and
> transcript fascinating.
>
> http://olstrans.sourceforge.net/release/OLS2000-rsync/OLS2000-rsync.html
>
> --
> Iris Couch

Mime
View raw message