jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject [jr3 microkernel] Change log consolidation
Date Fri, 03 Feb 2012 18:44:30 GMT


While working on a new transient space implementation [1] I discovered 
that recreating a list of changes from transient modifications to the 
hierarchy is like opening a can of worms. Rethinking things it turned 
out, that it is much simpler to directly consolidate the change log. 
This approach is much more flexible and versatile since knowledge of the 
persisted hierarchy is not necessary at all. Thus this approach should 
work equally well with change logs for jr3, the Microkernel, Jackrabbit 
core, JCR and SPI.

The algorithm is capable of creating minimal change logs from any valid 
change log (that is from any change log that can be applied to some 

I extensively tested the algorithm on randomly created change logs [3]. 
This resulted in some interesting numbers: consolidation results in an 
decrease of about 45% of these randomly generated change logs. 
Furthermore the resulting size of the H2 database files *decreased up to 

The implementation [2] is currently part of the jackrabbit-microkernel 
module in the sandbox. However it does not have any crucial dependencies 
so it should be easy to adapt for other usages.

The gory details of the algorithm are based on some algebraic properties 
of move operations on paths as detailed in the following paragraphs.

The change log consolidation algorithm implemented in the reduce method 
[2] is based on algebraic properties of move operations on paths. The 
other operations (add node, remove node and set property) are 
generalized to move operations. Consolidation relies on reduction and 
commutation rules of move operations.

A move operation resembles a map on a hierarchy (of nodes and 
properties). A change log consisting of k move operations m_1 to m_k is 
thus the composition of the individual moves: m_1 *, ..., * m_k (using * 
for function composition: f(g(x)) = (f * g)(x)).

Some definitions, notation and propositions:

* Let NIL denote a path which never occurs in any hierarchy.

* Order on paths: let p, q be paths.
   - p < q iff p != NIL and q != NIL and p is an ancestor of q.
   - p <= q iff p < q or p == q != NIL

* Conflict of paths: let p, q be paths.
   - p ~ q (p conflicts with q) iff either p <= q or q <= p

* Substitution in paths: let p, q, r be paths.
   - [p -> q]r = r if p is not an ancestor of r and
   - [p -> q]r = s where s is the path resulting from replacing
     the ancestor p in r with q otherwise.

* Let p, q be paths. Then p:q denotes a move operation where the
   node at p is moved to a new node at q.

* Valid moves: leq p, q be paths.
   - p:q is valid iff p !~ q or p = q
   - if p:q is not valid, it is invalid

Invalid moves are exactly those which would result in a node being moved 
to an ancestor/descendant of itself.

* Identity on moves: let p, q be paths.
   - p:q = ID iff p = q.

* Conflict on moves: let p, q, r, s be paths.
   - p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
     or q ~ r or q ~ s.

* Strict commutativity of moves: let m, n be moves.
   - m * n = n * m iff m !~ n

* Substitutions in moves: let p, q, r, s be paths.
   - [p -> q]r:s = [p -> q]r:[p -> q]s

* Let p be a path and let +p denote an add node operation and let -p
   denote a remove node operation for a node at path p.
   - +p = NIL:p That is, adding a node is represented by a move from a
     unknown source.
   - p = p:NIL. That is, removing a node is represented by a move to an
     unknown sink.

Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID 
and n != ID. Then the following reduction and commutation rules apply:

1.  p!~ r:  m * n = n * m
2.  p < r:  illegal (since this implies q <= r which implies p ~ q and
             thus m invalid)
3.  p = r:  illegal (since this implies q <= r which implies p ~ q and
             this m invalid)
4.  p > r:  does not commute if q < s. Otherwise m * n = n * [r -> s]m
5.  p!~ s:  m * n = n * m
6.  p < s:  illegal (since this implies p ~ q and thus m invalid)
7.  p = s:  does not commute
8.  p > s:  illegal (since p > s implies there is an s already which
             will conflict with r:s)
9.  q!~ r:  m * n = n * m
10. q < r:  m * n = [q -> p]n * m
11. q = r:  m * n = p:s (transitivity of moves)
12. q > r:  m * n = n * [r -> s]m
13. q!~ s:  m * n = n * m
14. q < s:  does not commute if p > r. Otherwise m * n = [q -> p]n * m
15. q = s:  illegal (since s conflicts with r:s)
16. q > s:  illegal (since s conflicts with r:s)

Allowing add node and remove node operations the following additional 
conditions apply:

Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the 
reduction and commutations rules 1. to 16. apply with extra conditions 
on 4., 10., 12. and 14.:

4'.  if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
      m, n do not commute.
10'. illegal if p = NIL
12'. if s = NIL then m * n = -r * -p
14'. illegal if p = NIL


[1] http://markmail.org/message/qkkcvtmtapas2cx4

View raw message