jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Guggisberg <stefan.guggisb...@gmail.com>
Subject Re: [jr3 microkernel] Change log consolidation
Date Sat, 04 Feb 2012 20:07:49 GMT
On Fri, Feb 3, 2012 at 7:44 PM, Michael Dürig <mduerig@apache.org> wrote:
>
> Hi,
>
> While working on a new transient space implementation [1] I discovered that
> recreating a list of changes from transient modifications to the hierarchy
> is like opening a can of worms. Rethinking things it turned out, that it is
> much simpler to directly consolidate the change log. This approach is much
> more flexible and versatile since knowledge of the persisted hierarchy is
> not necessary at all. Thus this approach should work equally well with
> change logs for jr3, the Microkernel, Jackrabbit core, JCR and SPI.
>
> The algorithm is capable of creating minimal change logs from any valid
> change log (that is from any change log that can be applied to some
> hierarchy).
>
> I extensively tested the algorithm on randomly created change logs [3]. This
> resulted in some interesting numbers: consolidation results in an decrease
> of about 45% of these randomly generated change logs. Furthermore the
> resulting size of the H2 database files *decreased up to 200%*!
>
> The implementation [2] is currently part of the jackrabbit-microkernel
> module in the sandbox. However it does not have any crucial dependencies so
> it should be easy to adapt for other usages.
>
> The gory details of the algorithm are based on some algebraic properties of
> move operations on paths as detailed in the following paragraphs.
>
> The change log consolidation algorithm implemented in the reduce method [2]
> is based on algebraic properties of move operations on paths. The other
> operations (add node, remove node and set property) are generalized to move
> operations. Consolidation relies on reduction and commutation rules of move
> operations.
>
> A move operation resembles a map on a hierarchy (of nodes and properties). A
> change log consisting of k move operations m_1 to m_k is thus the
> composition of the individual moves: m_1 *, ..., * m_k (using * for function
> composition: f(g(x)) = (f * g)(x)).
>
> Some definitions, notation and propositions:
>
> * Let NIL denote a path which never occurs in any hierarchy.
>
> * Order on paths: let p, q be paths.
>  - p < q iff p != NIL and q != NIL and p is an ancestor of q.
>  - p <= q iff p < q or p == q != NIL
>
> * Conflict of paths: let p, q be paths.
>  - p ~ q (p conflicts with q) iff either p <= q or q <= p
>
> * Substitution in paths: let p, q, r be paths.
>  - [p -> q]r = r if p is not an ancestor of r and
>  - [p -> q]r = s where s is the path resulting from replacing
>    the ancestor p in r with q otherwise.
>
> * Let p, q be paths. Then p:q denotes a move operation where the
>  node at p is moved to a new node at q.
>
> * Valid moves: leq p, q be paths.
>  - p:q is valid iff p !~ q or p = q
>  - if p:q is not valid, it is invalid
>
> Invalid moves are exactly those which would result in a node being moved to
> an ancestor/descendant of itself.
>
> * Identity on moves: let p, q be paths.
>  - p:q = ID iff p = q.
>
> * Conflict on moves: let p, q, r, s be paths.
>  - p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
>    or q ~ r or q ~ s.
>
> * Strict commutativity of moves: let m, n be moves.
>  - m * n = n * m iff m !~ n
>
> * Substitutions in moves: let p, q, r, s be paths.
>  - [p -> q]r:s = [p -> q]r:[p -> q]s
>
> * Let p be a path and let +p denote an add node operation and let -p
>  denote a remove node operation for a node at path p.
>  - +p = NIL:p That is, adding a node is represented by a move from a
>    unknown source.
>  - p = p:NIL. That is, removing a node is represented by a move to an
>    unknown sink.
>
>
> Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID and
> n != ID. Then the following reduction and commutation rules apply:
>
> 1.  p!~ r:  m * n = n * m
> 2.  p < r:  illegal (since this implies q <= r which implies p ~ q and
>            thus m invalid)
> 3.  p = r:  illegal (since this implies q <= r which implies p ~ q and
>            this m invalid)
> 4.  p > r:  does not commute if q < s. Otherwise m * n = n * [r -> s]m
> 5.  p!~ s:  m * n = n * m
> 6.  p < s:  illegal (since this implies p ~ q and thus m invalid)
> 7.  p = s:  does not commute
> 8.  p > s:  illegal (since p > s implies there is an s already which
>            will conflict with r:s)
> 9.  q!~ r:  m * n = n * m
> 10. q < r:  m * n = [q -> p]n * m
> 11. q = r:  m * n = p:s (transitivity of moves)
> 12. q > r:  m * n = n * [r -> s]m
> 13. q!~ s:  m * n = n * m
> 14. q < s:  does not commute if p > r. Otherwise m * n = [q -> p]n * m
> 15. q = s:  illegal (since s conflicts with r:s)
> 16. q > s:  illegal (since s conflicts with r:s)
>
> Allowing add node and remove node operations the following additional
> conditions apply:
>
> Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the
> reduction and commutations rules 1. to 16. apply with extra conditions on
> 4., 10., 12. and 14.:
>
> 4'.  if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
>     m, n do not commute.
> 10'. illegal if p = NIL
> 12'. if s = NIL then m * n = -r * -p
> 14'. illegal if p = NIL

sounds very complicated ;) i admit that i don't
fully understand it (ok, i didn't try very hard
and gave up somewhere in the middle).

could you please provide a simple example?

what is the benefit of this approach compared
to the imlementation in jackrabbit-core
(which does consolidate transient changes)?

cheers
stefan

>
> Michael
>
>
> [1] http://markmail.org/message/qkkcvtmtapas2cx4
> [2]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup
> [3]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup

Mime
View raw message