couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Unique instance IDs?
Date Tue, 13 Dec 2011 01:03:39 GMT
On Mon, Dec 12, 2011 at 5:39 PM, Randall Leeds <randall.leeds@gmail.com> wrote:
> On Mon, Dec 12, 2011 at 09:01, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> On Mon, Dec 12, 2011 at 1:09 AM, Jason Smith <jhs@iriscouch.com> wrote:
>>> On Mon, Dec 12, 2011 at 5:16 AM, Paul Davis <paul.joseph.davis@gmail.com>
wrote:
>>>>> A couch URL is its unique identifier. A database URL is its unique
>>>>> identifier. This sounds like a too-clever-by-half optimization. IMHO.
>>>>>
>>>>> --
>>>>> Iris Couch
>>>>
>>>> To this I ask simply: What's the URL of my phone? Tying a URL to a
>>>> database is like identifying a person by their address. A UUID per
>>>> created database is much more fine grained, but has operations issues
>>>> with file handling and what not.
>>>
>>> Hi, Paul. A database is not a person. It is a resource, with a
>>> universal location.
>>>
>>> Databases can be replicated, or copied, or restored from backup. (Same
>>> for .ini files.)
>>>
>>> One .couch file can be served from different URLs; and one URL might
>>> serve different .couch files over time. The current replicator
>>> understands this and if anything seems fishy, it double-checks. (For
>>> example, the instance_start_time helps to detect wholesale replacement
>>> of .couch files.)
>>>
>>> The web assumes that mostly, but not always, a stable URL represents a
>>> stable resource. So does the replicator. Getting away from that seems
>>> difficult.
>>>
>>> --
>>> Iris Couch
>>
>> I think you've contradicted yourself. If a URL is the universal name
>> for a database, then how are we able to server different databases
>> from the same URL?
>>
>> Tying a database to a URL is merely an artificial limitation because
>> we haven't thought of anything better. If we *did* think of a way to
>> uniquely identify databases that didn't break due to ops requirements
>> then that would be a much better fit to the CouchDB model. It is
>> difficult but that's because we haven't yet thought of a good way to
>> deal with what happens OOB when ops teams change server
>> configurations.
>
> Using anything other than the URL is a re-invention of DNS.

Whut?

> IMO, the more interesting thing to ask is "What is the URL of my
> phone?" and "How can it be sticky when I'm mobile?"
>

A URL is not the same as a database. Perhaps I'm having issues
conveying why that is so, but I'm that's the point I'm generally
trying to make.

In other words, URL is a locator, its answering the "where is this
thing at?" And as the phone (or even laptop) situation shows, there is
more than one answer, but the *implicit* idea I'm trying to point is
that the definition of "thing" in that question is a constant.

Having a UUID for every database created is the ideal
harmonious-to-theory manifestation of "what is a db?" but we have to
deal with reality when people may copy a file which makes things a bit
weird when there are two instances of a UUID db.

> There's actually no problem with moving DBs around today, except that
> replication starts over (unless you change host names to match).

The "except that replication starts over" is a very significant caveat
that I would say contradicts the entire "no problem" description.

> So
> let's get back to the transitive checkpoints discussion.
>
> It's a very 2.0 idea, but, imagine a CouchDB consumes the following
> changes feed during a pull replication:
>
> ...
> {"seq":121,"id":"f39f35c075587342826f133327e0b69e","changes":[{"rev":"1-719a49d5a340bc043d36913e5ecaad0b"}]}
> {"seq":122,"id":"ca809b23ca16c761b24e82b5f247008c","changes":[{"rev":"1-d995113da09a9b058df55cef982aa8f6"}]}
>
> If we pretend that the above changes, without loss of generality,
> become local seq # 151 and 152 of the target couch, then a downstream
> replicator might see the following in the changes feed:
>
> {"seq":152,"id":"ca809b23ca16c761b24e82b5f247008c","changes":[{"rev":"1-d995113da09a9b058df55cef982aa8f6"},{"source":"https://othercouch/","seq":122}]}
>
> Further downstream couches might see:
>
> {"seq":1043,"id":"ca809b23ca16c761b24e82b5f247008c","changes":[{"rev":"1-d995113da09a9b058df55cef982aa8f6"},{"source":"https://othercouch/","seq":123},
> {"source":"https://couchtwo/","seq":152}]}
>
> Alternatively, a checkpoint entry in the sequence index might have its
> own id and not refer to a document update. In this way, the question
> "What checkpoints does this couch have?" becomes much more interesting
> than, and possibly obviates entirely, the question "What couch am I
> talking to?" The latter is a statement about identity when what we
> should care about is content.
>
> Some care would need to be taken to consider information leakage in
> certain scenarios and it may be desirable to allow signed checkpoints
> if couches are going to need guarantees about which changes they're
> receiving and who created them.
>
> -Randall

There's definitely considerations we need to make here in terms of
information leakage. On one hand, improving replication efficiency is
awesome, but there are obvious cases where I don't want to tell people
who I've replicated with.

More importantly though, we can address the "have I replicated to you
before?" question and that's important enough to be seriously
considering alternatives to host based replication identifiers.

Mime
View raw message