couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Antony Blakey <antony.bla...@gmail.com>
Subject Re: Fail on a simple case on replication
Date Mon, 23 Feb 2009 23:34:52 GMT

On 24/02/2009, at 9:32 AM, Dean Landolt wrote:

>> Can you suggest how we improve the wiki docs to satisfy this? In my
>> opinion, the docs are clear* and the term is overloaded and  
>> confusing.
>>
>> * http://wiki.apache.org/couchdb/Document_revisions has
>> "You cannot rely on document revisions for any other purpose
>> than concurrency control." in bold letters.
>>
>> I stated this in earlier discussions as well: Even if our  
>> documentation
>> were perfect, we don't control how people learn about CouchDB. We
>> only control the API and we should work hard to get it right.
>>
>> The way it stands now, a lot of people new to CouchDB get it wrong
>> because "revision" is a familiar term and they associate the  
>> behaviour
>> they associate with it to them. That's how humans learn. In this case
>> we make the learning hard.

Firstly, I completely agree that one should consider the implications  
of using certain terms; the baggage and context such terms bring with  
them.

<flamesuit on>
OTOH, one should use the correct term and not redefine existing terms  
to suit one's own purpose. In a tangentially related way, the use of  
the term RESTful wrt CouchDB is a marketing abomination.
</flamesuit off>

The documentation about replication, the role of revisions, the lack  
of inter-document consistency guarantees (including, crucially to the  
operation model, the lack of Monotonic Write guarantees), really needs  
to be expanded.

The consequences of CouchDB's underlying model aren't immediately  
obvious, and should be spelled out, as I started to do here: http://mail-archives.apache.org/mod_mbox/couchdb-dev/200902.mbox/%3c0FDDC57C-DB78-4241-86DE-549FECC8B558@gmail.com%3e

  - which was obviously in the context of changing that mechanism, but  
still the explanation and references are useful.

> I couldn't agree more with this sentiment, but revision still  
> strikes me as
> the right term. Perhaps the easiest way to fix this misconception is  
> for
> there to actually be a way to keep old revisions around for good :)
>
> Would it be overly difficult to just add in the ability to keep a  
> full rev
> history based on a config setting? The replication api would need to
> accommodate this, of course, and if the machine you're replicating  
> from
> doesn't also keep old revisions around your SOL, but is there any  
> other
> compelling reason to not offer this option? If it wouldn't  
> complicate the
> code base, this seems like a helpful feature. Sure, it could be  
> wasteful and
> should be off by default, but if your dataset is relatively small,  
> this
> config flag would be pretty nice to have, and it could help clear up  
> this
> confusion.

Danger Will Robinson!

The problem here is that you then need to make certain guarantees  
about revisions to make them at all useful, and you get into a  
discussion like the above email thread.

IMO, discussing these issues without having read the relevant  
literature around replication models, is a waste of time. Serious  
research has been done into this, and (once again, IMO) it is more  
productive to advance that understanding than try (and possibly fail)  
to reinvent the wheel.

Antony Blakey
--------------------------
CTO, Linkuistics Pty Ltd
Ph: 0438 840 787

A priest, a minister and a rabbi walk into a bar. The bartender says  
"What is this, a joke?"



Mime
View raw message