couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: proposed replication rev history changes
Date Sun, 08 Feb 2009 18:28:13 GMT
On Sun, Feb 8, 2009 at 12:51 PM, Jeff Hinrichs - DM&T
<dundeemt@gmail.com> wrote:
> On Sun, Feb 8, 2009 at 11:18 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> Patrick,
>>
>> The issue at hand is precisely that the current revisioning system
>> makes this use case deterministically identifiable as a conflict. The
>> proposed change means that we introduce the possibility that we are
>> unable to determine if we have a real conflict or a 'conflict due to
>> missing history'.
>>
>> Its possible I missed something that special casing the initial
>> revision would solve, but as I read the proposal it doesn't really fix
>> the underlying problem of possibly spurious conflicts while
>> introducing more complexity into the code.
>>
>
> I agree with Paul, this seems to increase instead of reduce problems.
> Limiting the number of revisions seems to be an optimization that
> should be handled by the client.  If the user wants that type of
> optimization, it would seem (though I could be misunderstanding), that
> the client would just need to coordinate the clipping of revision
> history by creating a new document by copying the current and deleting
> the old one. If that seems to be to problematic, the simple removal of
> all previous revision history would seem to fit the current model
> without introducing the new side-effect.  Couch could offer a
> strip_revision capability.
>
> Now that I've said that out loud -- it seems just as flawed as the
> proposed patch.
>
> It appears that without full history any attempt to optimize the
> revision history is going to result in the same set of problems.  So
> it would seem that the client should  do what can be done to limit the
> number of revisions to a given document.  i.e. implement their own
> revision algorithm on top of couch.  The only other optimization would
> seem for couch to keep track of what revisions it has already
> replicated with a given peer and only send new ones.  That would need
> to be stored outside of the actual document otherwise it would never
> finish, perhaps that information could be stored in a view and not the
> actual document.  If the view is destroyed or lost, then couch would
> need to start it again -- lots of stuff would replicate on the first
> replication, but then be optimized for the following events.
>
[snip]

Jeff,

I'm a bit confused so I'm not sure if I'm going to answer this right.

Firstly, the internal _rev system is most definitely *not* to be used
as a document revision history. I'm not sure if you're suggesting as
such or not, but I always feel that we need to point that out as often
as possible.

The internal _rev system is required to be able to determine document
conflicts in things like replication. That is all. So any clients that
want prior document versions must already set up their own history
algorithms.

The issue of the proposed patch is to figure out how to account for
real life limits. If we were to keep the entire history of _rev's then
we run into issues of sending and storing all of that data during
replication. On the other end of the spectrum if we kept no history,
all replicated edits would cause a conflict.

So the proposed patch is the obvious middle ground of keeping at most
N revisions (configurable by the user) for use in conflict detection.

At the moment I see this as the easiest solution overall as it would
give users the best control of the different tradeoffs.

HTH,
Paul Davis

Mime
View raw message