couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Hinrichs - DM&T" <dunde...@gmail.com>
Subject Re: proposed replication rev history changes
Date Sun, 08 Feb 2009 17:51:10 GMT
On Sun, Feb 8, 2009 at 11:18 AM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
> Patrick,
>
> The issue at hand is precisely that the current revisioning system
> makes this use case deterministically identifiable as a conflict. The
> proposed change means that we introduce the possibility that we are
> unable to determine if we have a real conflict or a 'conflict due to
> missing history'.
>
> Its possible I missed something that special casing the initial
> revision would solve, but as I read the proposal it doesn't really fix
> the underlying problem of possibly spurious conflicts while
> introducing more complexity into the code.
>

I agree with Paul, this seems to increase instead of reduce problems.
Limiting the number of revisions seems to be an optimization that
should be handled by the client.  If the user wants that type of
optimization, it would seem (though I could be misunderstanding), that
the client would just need to coordinate the clipping of revision
history by creating a new document by copying the current and deleting
the old one. If that seems to be to problematic, the simple removal of
all previous revision history would seem to fit the current model
without introducing the new side-effect.  Couch could offer a
strip_revision capability.

Now that I've said that out loud -- it seems just as flawed as the
proposed patch.

It appears that without full history any attempt to optimize the
revision history is going to result in the same set of problems.  So
it would seem that the client should  do what can be done to limit the
number of revisions to a given document.  i.e. implement their own
revision algorithm on top of couch.  The only other optimization would
seem for couch to keep track of what revisions it has already
replicated with a given peer and only send new ones.  That would need
to be stored outside of the actual document otherwise it would never
finish, perhaps that information could be stored in a view and not the
actual document.  If the view is destroyed or lost, then couch would
need to start it again -- lots of stuff would replicate on the first
replication, but then be optimized for the following events.

Regards,
Jeff Hinrichs
> HTH,
> Paul Davis
>
> On Sun, Feb 8, 2009 at 12:14 PM, Patrick Antivackis
> <patrick.antivackis@gmail.com> wrote:
>> And what today's revision system help in such a case ?
>>
>>
>> 2009/2/8 Paul Davis <paul.joseph.davis@gmail.com>
>>
>>> On Sun, Feb 8, 2009 at 11:50 AM, Patrick Antivackis
>>> <patrick.antivackis@gmail.com> wrote:
>>> > I'm not sure I understood what you asked.
>>> >
>>> > It would be a conflict of document, that would need either manual
>>> correction
>>> > or why not an automatic correction applying a move to one of the
>>> document,
>>> > but at least couch can tell for sure it was not the same document at the
>>> > origin.
>>> >
>>> > What I not understand is what today's revision system or proposed
>>> revision
>>> > system will bring more for this kind of conflict with two different
>>> > documents are created with same Id on two different nodes ? Except that
>>> with
>>> > the new revision proposal, you don't know for sure it was same or
>>> different
>>> > document at the origin if replications occurs after you trimmed the
>>> > reference to the first revision.
>>> >
>>>
>>> I'm saying that your suggestion to always retain the first revision is
>>> going to run into problems when a document is created on two machines
>>> and thus has to initial revisions. Or rather, it will run into the
>>> same problems as Damien's proposal yet have the added complexity that
>>> we now have the special cased 'first revisions' info.
>>>
>>> Unless of course I'm missing something else in the details.
>>>
>>> >
>>> >
>>> >
>>> > 2009/2/8 Paul Davis <paul.joseph.davis@gmail.com>
>>> >
>>> >> On Sun, Feb 8, 2009 at 6:07 AM, Patrick Antivackis
>>> >> <patrick.antivackis@gmail.com> wrote:
>>> >> > 2009/2/8 Damien Katz <damien@apache.org>
>>> >> >
>>> >> >> You got everything right except this. It doesn't solve the
problem,
>>> >> because
>>> >> >> on another node, I could have a document that looked like ["1-foo"
>>> >> "2-bif"].
>>> >> >> That is a real edit conflict that wouldn't be caught by what
I think
>>> you
>>> >> are
>>> >> >> proposing.
>>> >> >>
>>> >> >
>>> >> > That's right,  there is a real edit conflict, but at least couchdb
can
>>> >> > detect it based on the first revision reference that is always
kept.
>>> >> > If you not keep the reference of the first revision you can arrive
to
>>> :
>>> >> > BaseA : ["1-foo"]
>>> >> > BaseB : empty
>>> >> > Replication :
>>> >> > BaseA : ["1-foo"]
>>> >> > BaseB : ["1-foo"]
>>> >> > Life goes on :
>>> >> > BaseA : ["1-foo" "2-bar" "3-baz" "4-biz"] but as it's trimmed to
3 you
>>> >> only
>>> >> > keep ["2-bar" "3-baz" "4-biz"]
>>> >> > BaseB : ["1-foo" "2-bad" "3-baf" "4-bif"] but as it's trimmed to
3 you
>>> >> only
>>> >> > keep ["2-bad" "3-baf" "4-bif"]
>>> >> > New replication :
>>> >> > ????? same Id but no common revision, what we do ? And couch can
not
>>> even
>>> >> > help to say if it was same doc or not at the origin.
>>> >> >
>>> >>
>>> >> Patrick,
>>> >>
>>> >> I'm pretty sure i see where you're coming from, but can you explain
>>> >> what would happen if the same document ID were created on two servers?
>>> >> Each server would have a different 'first rev' so who's first rev
>>> >> would be carried on in the future?
>>> >>
>>> >> > This is used during conflict detection to figure out if 2 tree
>>> fragments
>>> >> >> overlap. We don't actually store a sequential number for each
>>> revision,
>>> >> we
>>> >> >> store a revision tree of numbers, with the root of the tree
being the
>>> >> offset
>>> >> >> from 0 where it was trimmed (technically it's stemmed).
>>> >> >>
>>> >> >
>>> >> > You are right, keep trace of the numbrer of the revision is indeed
>>> >> important
>>> >> > especially when a same origin document in updated on different
>>> nodes.But
>>> >> > couldn't it be replace bu a timestamp, this is sequential too and
give
>>> >> even
>>> >> > more information.
>>> >> >
>>> >> >
>>> >> >> Sometimes people can deal with spurious conflicts. This gives
you the
>>> >> >> option. If you don't want spurious conflicts, don't use this
feature.
>>> >> >>
>>> >> >> And if you want the same document to be editted over and over,
100s
>>> of
>>> >> >> thousands of times, this is really the only option. The revision
>>> history
>>> >> >> will get too big and slow down updates tremendously.
>>> >> >>
>>> >> >> Sure but  I would say it's different use cases. Record management
for
>>> >> > examples needs to keep track of changes during a period of time.
And
>>> in
>>> >> all
>>> >> > CMS/ECM i have worked on, clean up of version is done on time base
>>> more
>>> >> than
>>> >> > on number of revision having occured.
>>> >> >
>>> >>
>>> >> HTH,
>>> >> Paul Davis
>>> >>
>>> >
>>>
>>
>

Mime
View raw message