couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Unique instance IDs?
Date Tue, 13 Dec 2011 01:40:23 GMT
On Mon, Dec 12, 2011 at 7:25 PM, Jason Smith <jhs@iriscouch.com> wrote:
> On Tue, Dec 13, 2011 at 8:03 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> Having a UUID for every database created is the ideal
>> harmonious-to-theory manifestation of "what is a db?" but we have to
>> deal with reality when people may copy a file which makes things a bit
>> weird when there are two instances of a UUID db.
>
> You didn't say "harsh reality," but to list some legitimate situations
> where people might copy .couch files:
>
> * Restoring from backups
> * Cloning a VMWare image
> * Booting an EC2 AMI
> * NAS storage clusters
> * Couchbase mobile bootstrapping

Exactly the sorts of reasons why I haven't just slapped a UUID in to
the db header. :D

>
>>> There's actually no problem with moving DBs around today, except that
>>> replication starts over (unless you change host names to match).
>>
>> The "except that replication starts over" is a very significant caveat
>> that I would say contradicts the entire "no problem" description.
>
> Nobody has shown that "replication starts over" is bad. The implicit
> assumption is that starting over is costly. At present, yes, that is
> true, but that's mostly a bunch of "no-op" round-trips diffing the
> revs.
>

No-op round trips are fine until you have to make millions of them
over edge networks. Now there obviously isn't a huge uproar over this
inefficiency because we'd be having a very different conversation if
there had been. But the fact remains that the current situation is
just bad and the only reason there hasn't been an uproar is because we
don't yet have a huge enterprise company that's been running some
phone replication db for fifteen years without upgrading.

Its always better to fix errors in our model before they cause issues though.

> If there were a hypothetical single query which let the receiver
> assess its exact relationship to an arbitrary sender's data, I don't
> think "starts over" would sound as awful.
>

I agree whole heartedly. And the easiest way I see to making that
happen is to decouple the host and db identities in such a way that
this is a reality. Its possible there's something elegant we could
pull from things like merkle trees. I've spent time considering it and
haven't thought of anything but I'd be tickled pink if there were a
reasonable solution there.

> --
> Iris Couch

Mime
View raw message