On Fri, Feb 3, 2012 at 7:44 PM, Michael Dürig <mduerig@apache.org> wrote:
>
> Hi,
>
> While working on a new transient space implementation [1] I discovered that
> recreating a list of changes from transient modifications to the hierarchy
> is like opening a can of worms. Rethinking things it turned out, that it is
> much simpler to directly consolidate the change log. This approach is much
> more flexible and versatile since knowledge of the persisted hierarchy is
> not necessary at all. Thus this approach should work equally well with
> change logs for jr3, the Microkernel, Jackrabbit core, JCR and SPI.
>
> The algorithm is capable of creating minimal change logs from any valid
> change log (that is from any change log that can be applied to some
> hierarchy).
>
> I extensively tested the algorithm on randomly created change logs [3]. This
> resulted in some interesting numbers: consolidation results in an decrease
> of about 45% of these randomly generated change logs. Furthermore the
> resulting size of the H2 database files *decreased up to 200%*!
>
> The implementation [2] is currently part of the jackrabbitmicrokernel
> module in the sandbox. However it does not have any crucial dependencies so
> it should be easy to adapt for other usages.
>
> The gory details of the algorithm are based on some algebraic properties of
> move operations on paths as detailed in the following paragraphs.
>
> The change log consolidation algorithm implemented in the reduce method [2]
> is based on algebraic properties of move operations on paths. The other
> operations (add node, remove node and set property) are generalized to move
> operations. Consolidation relies on reduction and commutation rules of move
> operations.
>
> A move operation resembles a map on a hierarchy (of nodes and properties). A
> change log consisting of k move operations m_1 to m_k is thus the
> composition of the individual moves: m_1 *, ..., * m_k (using * for function
> composition: f(g(x)) = (f * g)(x)).
>
> Some definitions, notation and propositions:
>
> * Let NIL denote a path which never occurs in any hierarchy.
>
> * Order on paths: let p, q be paths.
>  p < q iff p != NIL and q != NIL and p is an ancestor of q.
>  p <= q iff p < q or p == q != NIL
>
> * Conflict of paths: let p, q be paths.
>  p ~ q (p conflicts with q) iff either p <= q or q <= p
>
> * Substitution in paths: let p, q, r be paths.
>  [p > q]r = r if p is not an ancestor of r and
>  [p > q]r = s where s is the path resulting from replacing
> the ancestor p in r with q otherwise.
>
> * Let p, q be paths. Then p:q denotes a move operation where the
> node at p is moved to a new node at q.
>
> * Valid moves: leq p, q be paths.
>  p:q is valid iff p !~ q or p = q
>  if p:q is not valid, it is invalid
>
> Invalid moves are exactly those which would result in a node being moved to
> an ancestor/descendant of itself.
>
> * Identity on moves: let p, q be paths.
>  p:q = ID iff p = q.
>
> * Conflict on moves: let p, q, r, s be paths.
>  p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
> or q ~ r or q ~ s.
>
> * Strict commutativity of moves: let m, n be moves.
>  m * n = n * m iff m !~ n
>
> * Substitutions in moves: let p, q, r, s be paths.
>  [p > q]r:s = [p > q]r:[p > q]s
>
> * Let p be a path and let +p denote an add node operation and let p
> denote a remove node operation for a node at path p.
>  +p = NIL:p That is, adding a node is represented by a move from a
> unknown source.
>  p = p:NIL. That is, removing a node is represented by a move to an
> unknown sink.
>
>
> Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID and
> n != ID. Then the following reduction and commutation rules apply:
>
> 1. p!~ r: m * n = n * m
> 2. p < r: illegal (since this implies q <= r which implies p ~ q and
> thus m invalid)
> 3. p = r: illegal (since this implies q <= r which implies p ~ q and
> this m invalid)
> 4. p > r: does not commute if q < s. Otherwise m * n = n * [r > s]m
> 5. p!~ s: m * n = n * m
> 6. p < s: illegal (since this implies p ~ q and thus m invalid)
> 7. p = s: does not commute
> 8. p > s: illegal (since p > s implies there is an s already which
> will conflict with r:s)
> 9. q!~ r: m * n = n * m
> 10. q < r: m * n = [q > p]n * m
> 11. q = r: m * n = p:s (transitivity of moves)
> 12. q > r: m * n = n * [r > s]m
> 13. q!~ s: m * n = n * m
> 14. q < s: does not commute if p > r. Otherwise m * n = [q > p]n * m
> 15. q = s: illegal (since s conflicts with r:s)
> 16. q > s: illegal (since s conflicts with r:s)
>
> Allowing add node and remove node operations the following additional
> conditions apply:
>
> Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the
> reduction and commutations rules 1. to 16. apply with extra conditions on
> 4., 10., 12. and 14.:
>
> 4'. if s = NIL and q = NIL then m * n = r. Otherwise if s = NIL then
> m, n do not commute.
> 10'. illegal if p = NIL
> 12'. if s = NIL then m * n = r * p
> 14'. illegal if p = NIL
sounds very complicated ;) i admit that i don't
fully understand it (ok, i didn't try very hard
and gave up somewhere in the middle).
could you please provide a simple example?
what is the benefit of this approach compared
to the imlementation in jackrabbitcore
(which does consolidate transient changes)?
cheers
stefan
>
> Michael
>
>
> [1] http://markmail.org/message/qkkcvtmtapas2cx4
> [2]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbitmicrokernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup
> [3]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbitmicrokernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup
