On Fri, Feb 3, 2012 at 7:44 PM, Michael Dürig wrote: > > Hi, > > While working on a new transient space implementation [1] I discovered that > recreating a list of changes from transient modifications to the hierarchy > is like opening a can of worms. Rethinking things it turned out, that it is > much simpler to directly consolidate the change log. This approach is much > more flexible and versatile since knowledge of the persisted hierarchy is > not necessary at all. Thus this approach should work equally well with > change logs for jr3, the Microkernel, Jackrabbit core, JCR and SPI. > > The algorithm is capable of creating minimal change logs from any valid > change log (that is from any change log that can be applied to some > hierarchy). > > I extensively tested the algorithm on randomly created change logs [3]. This > resulted in some interesting numbers: consolidation results in an decrease > of about 45% of these randomly generated change logs. Furthermore the > resulting size of the H2 database files *decreased up to 200%*! > > The implementation [2] is currently part of the jackrabbit-microkernel > module in the sandbox. However it does not have any crucial dependencies so > it should be easy to adapt for other usages. > > The gory details of the algorithm are based on some algebraic properties of > move operations on paths as detailed in the following paragraphs. > > The change log consolidation algorithm implemented in the reduce method [2] > is based on algebraic properties of move operations on paths. The other > operations (add node, remove node and set property) are generalized to move > operations. Consolidation relies on reduction and commutation rules of move > operations. > > A move operation resembles a map on a hierarchy (of nodes and properties). A > change log consisting of k move operations m_1 to m_k is thus the > composition of the individual moves: m_1 *, ..., * m_k (using * for function > composition: f(g(x)) = (f * g)(x)). > > Some definitions, notation and propositions: > > * Let NIL denote a path which never occurs in any hierarchy. > > * Order on paths: let p, q be paths. >  - p < q iff p != NIL and q != NIL and p is an ancestor of q. >  - p <= q iff p < q or p == q != NIL > > * Conflict of paths: let p, q be paths. >  - p ~ q (p conflicts with q) iff either p <= q or q <= p > > * Substitution in paths: let p, q, r be paths. >  - [p -> q]r = r if p is not an ancestor of r and >  - [p -> q]r = s where s is the path resulting from replacing >    the ancestor p in r with q otherwise. > > * Let p, q be paths. Then p:q denotes a move operation where the >  node at p is moved to a new node at q. > > * Valid moves: leq p, q be paths. >  - p:q is valid iff p !~ q or p = q >  - if p:q is not valid, it is invalid > > Invalid moves are exactly those which would result in a node being moved to > an ancestor/descendant of itself. > > * Identity on moves: let p, q be paths. >  - p:q = ID iff p = q. > > * Conflict on moves: let p, q, r, s be paths. >  - p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s >    or q ~ r or q ~ s. > > * Strict commutativity of moves: let m, n be moves. >  - m * n = n * m iff m !~ n > > * Substitutions in moves: let p, q, r, s be paths. >  - [p -> q]r:s = [p -> q]r:[p -> q]s > > * Let p be a path and let +p denote an add node operation and let -p >  denote a remove node operation for a node at path p. >  - +p = NIL:p That is, adding a node is represented by a move from a >    unknown source. >  - p = p:NIL. That is, removing a node is represented by a move to an >    unknown sink. > > > Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID and > n != ID. Then the following reduction and commutation rules apply: > > 1.  p!~ r:  m * n = n * m > 2.  p < r:  illegal (since this implies q <= r which implies p ~ q and >            thus m invalid) > 3.  p = r:  illegal (since this implies q <= r which implies p ~ q and >            this m invalid) > 4.  p > r:  does not commute if q < s. Otherwise m * n = n * [r -> s]m > 5.  p!~ s:  m * n = n * m > 6.  p < s:  illegal (since this implies p ~ q and thus m invalid) > 7.  p = s:  does not commute > 8.  p > s:  illegal (since p > s implies there is an s already which >            will conflict with r:s) > 9.  q!~ r:  m * n = n * m > 10. q < r:  m * n = [q -> p]n * m > 11. q = r:  m * n = p:s (transitivity of moves) > 12. q > r:  m * n = n * [r -> s]m > 13. q!~ s:  m * n = n * m > 14. q < s:  does not commute if p > r. Otherwise m * n = [q -> p]n * m > 15. q = s:  illegal (since s conflicts with r:s) > 16. q > s:  illegal (since s conflicts with r:s) > > Allowing add node and remove node operations the following additional > conditions apply: > > Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the > reduction and commutations rules 1. to 16. apply with extra conditions on > 4., 10., 12. and 14.: > > 4'.  if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then >     m, n do not commute. > 10'. illegal if p = NIL > 12'. if s = NIL then m * n = -r * -p > 14'. illegal if p = NIL sounds very complicated ;) i admit that i don't fully understand it (ok, i didn't try very hard and gave up somewhere in the middle). could you please provide a simple example? what is the benefit of this approach compared to the imlementation in jackrabbit-core (which does consolidate transient changes)? cheers stefan > > Michael > > > [1] http://markmail.org/message/qkkcvtmtapas2cx4 > [2] > http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup > [3] > http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup