subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Julian Foad <>
Subject Ev2 as a move-aware editor
Date Mon, 24 Jun 2013 15:22:01 GMT
For move tracking we need a move-aware editor.  svn_editor_t ("Ev2") is designed to support
moves.  However, I'm not clear how its move support is meant to work, on a fairly basic level:

  * What are the ordering rules for moves?

  * Are the move operations to be interprested in series or in parallel?

  * When an operation refers to a node that is involved in a move, how are the paths to be

From conversations with other devs I know I'm not the only one.  There are a number of rules
written down in <svn_editor.h> but to my reading they didn't seem to explain the overall
scheme, and they contain inconsistencies, some of which I have posted as queries to this list
in the past but remain unresolved [1].

Having failed to grok what is intended, I have deliberately put off trying to dig further
into Ev2's version of move support, until now, in order to think freely about the theory and
to focus more evenly on all the areas of move tracking.  (That's why I started using Ev1
in my experimental branch [2] and in my notes [3].)  But now we should tackle this.

So, please could somebody help to elucidate how Ev2 move support works or should work?

== Basic Theory ==

I can imagine two possible extremes for the overall mode of operation to which such an editor
*could* be designed: parallel and sequential.  I'll mention some essential characteristics
of each.

== Sequential ==

Edits are described sequentially, incrementally.  The consumer maintains a state that describes
the 'current' tree shape, starting with the initialtree shape, and being changed incrementally
by each edit operation.

Each edit operation refers to paths that exist in the 'current' tree state and/or that will
exist in the next state.

== Parallel ==

The producer sends all changes in parallel or in an unspecified order.  Each change is described
independently of all other changes: it refers to paths (or nodes) in a way that doesn't depend
on the other changes being sent.  In fact, rather than talking about 'operations' or 'changes',
the semantics required here is to describe the final state in any compact way, not necessarily
in terms of changes against the initial state.

(One case that may seem difficult to parallelize is adding a new parent directory and its
children.  How can the consumer accept a child node before the creation of the parent directory? 
The answer is simply that the consumer would be designed to operate this way, storing the
children either in their final form or in a temporary form, and linking them into the parent
once all necessary information has been received.)

== Hybrid ==

We could mix these and have a scheme that is partly parallel, partly sequential, according
to some additional rules.  That is fine, but more complex to describe and understand.

Note that while Ev1 is mostly sequential, the copy-from field of a copy (when used for a move)
is more like the parallel scheme, since it refers to a path in the initial tree state and
not a path in the current tree state.

== Cyclic moves ==

We need to accommodate arbitrary sets of moves, including cycles.  For an example, we'll
use a simple cycle, swapping two siblings A and B.  This ability is needed for two reasons:

  * Ev1 supports this (del A, copy A from B@OLD, del B, copy B from A@OLD).

  * A single edit must be able to represent the combination of any series of edits, and a
series of moves can result in a cycle even if each individual edit only supports non-cyclic
moves.  (1: mv B C.  2: mv A B.  3: mv C A.)

In a parallel scheme, no special treatment is necessary:

    A -> B
    B -> A

In a sequential scheme, there are two ways to accommodate cyclic moves:

  * Allow moving to a temporary path and then later to the final path:
    mv A tmp; mv B A; mv tmp B

  * Allow a 'swap A B' or 'rotate A B ...' operation:
    rotate A B

== Swap or Rotate ==

swap(A,B) == { A->B || B->A }

is a primitive operation.  ('||' indicates 'in parallel'.)

rotate(A,B,...N) == { A->B || B->... || ...->N || N->A }

is non-primitive, since any rotate can be composed from multiple sequential swaps.  For example,
rotate(A,B,C) == { swap(B,C); swap(A,B) }.  It is no more useful to rotate more than two
paths in one operation (rotate(A,B,C)) than to move more than two paths  in one operation
(move(A,B,C) == { A->B || B->C }).

So if we have a sequential scheme and we don't want to use temporary paths, we should include
the 'swap' primitive rather than a 'rotate'.

== Ev2 ==

One of Ev2's goals is to parallelize text modifications, in order to take advantage of Serf's
parallelism.  What about tree changes, and specifically what are the semantics of the 'move'
operation?  The Ev2 documentation indicates sequential requirements for the 'add' operation
(parent before children), and also in this rule:

 * - The ancestor of an added, copied-here, moved-here, rotated, or
 *   modified node may not be deleted. The ancestor may not be moved
 *   (instead: perform the move, *then* the edits).

The doc string for 'svn_editor_move' says:

 * Move the node at @a src_relpath to @a dst_relpath.
 * @a src_revision specifies the revision at which the receiver should
 * expect to find this node.  That is, @a src_relpath at the start of
 * the whole edit and @a src_relpath at @a src_revision must lie within
 * the same node-rev (aka history-segment).  This is just like the
 * revisions specified to svn_editor_delete() and svn_editor_rotate().
 * ...

The discussion implies that "the node at src_relpath" refers to a node in the initial tree
state.  In that respect, it's a parallel operation: the source reference doesn't depend on
previous moves.  But look at the 'move_cb' implementation in libsvn_fs/editor.c:

  # src_root == *initial* tree state

  svn_fs_copy(src_root, src_fspath, root, dst_fspath, scratch_pool)
  /* Notice: we're deleting the src repos path from the dst root. */
  # root == *current* tree state (txn)
  svn_fs_delete(root, src_fspath, scratch_pool)

The delete of src_relpath within the current tree state is potentially at odds with the copy
from src_relpath relative to the initial state.  The delete would delete the wrong thing
if the original node at this path has already been replaced, or would try to delete a non-existent
node if it had already been deleted or moved away.  Could it have been replaced or deleted
or moved away, if not itself then perhaps by a copy/move/add/delete of a parent directory?

Let's try the following scenario:

  A->A2 || A/B->B2  # move a child of a moved parent

Try with Ev2:

  move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1)
  move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1)

The implementation shown above would fail in the second move when trying to delete 'A/B',
because that path no longer exists in the transaction after the first move deleted the whole
subtree at 'A'.

The other way works fine:

  move(src_relpath='A/B', src_rev=100, dst_relpath='B2', replaces_rev=-1)
  move(src_relpath='A', src_rev=100, dst_relpath='A2', replaces_rev=-1)

So these moves are not parallelizable with this implementation.  (Is that implementation

Ev2 also documents (as the first Driving Order Restriction) that there must be an alter_directory
call for the parent of moved-away node 'A/B'.  What does this look like?

  alter_dir('A', ...)


  alter_dir('A2', ...)

?  Can the alter_dir come before or after the move of 'A' to 'A2'?  Is the path it references
'A' in either case, or 'A2' in either case, or 'A' if it comes before the move and 'A2' if
it comes after the move?

Can we start with a clear consensus of whether we're trying to deliver a sequential or a parallel

- Julian

[1] For example, email subject "Re: Ev2 -- Driving Order Restrictions" from Julian Foad on
2012-07-18, <>.

[2] 'move-tracking-1' branch, <>.

[3] Wiki page 'MoveDev/MoveDev', <>.

Join WANdisco's free daily demo sessions on Scaling Subversion for the Enterprise

View raw message