From dev-return-34038-apmail-jackrabbit-dev-archive=jackrabbit.apache.org@jackrabbit.apache.org Fri Feb 3 18:45:03 2012
Return-Path:
X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org
Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 94FFC9146
for ; Fri, 3 Feb 2012 18:45:03 +0000 (UTC)
Received: (qmail 18992 invoked by uid 500); 3 Feb 2012 18:45:03 -0000
Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org
Received: (qmail 18943 invoked by uid 500); 3 Feb 2012 18:45:02 -0000
Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: dev@jackrabbit.apache.org
Delivered-To: mailing list dev@jackrabbit.apache.org
Received: (qmail 18936 invoked by uid 99); 3 Feb 2012 18:45:02 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 18:45:02 +0000
X-ASF-Spam-Status: No, hits=-1.6 required=5.0
tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL
X-Spam-Check-By: apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Received: from [64.18.1.37] (HELO exprod6og116.obsmtp.com) (64.18.1.37)
by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 03 Feb 2012 18:44:56 +0000
Received: from outbound-smtp-2.corp.adobe.com ([193.104.215.16]) by exprod6ob116.postini.com ([64.18.5.12]) with SMTP
ID DSNKTywrEv27K6m9bRr7HUzPi7F+2gC9D9zI@postini.com; Fri, 03 Feb 2012 10:44:35 PST
Received: from inner-relay-4.eur.adobe.com (inner-relay-4b [10.128.4.237])
by outbound-smtp-2.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q13IiYhO020158
for ; Fri, 3 Feb 2012 10:44:34 -0800 (PST)
Received: from nacas03.corp.adobe.com (nacas03.corp.adobe.com [10.8.189.121])
by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id q13IiXPl003064
for ; Fri, 3 Feb 2012 10:44:33 -0800 (PST)
Received: from eurhub01.eur.adobe.com (10.128.4.30) by nacas03.corp.adobe.com
(10.8.189.121) with Microsoft SMTP Server (TLS) id 8.3.192.1; Fri, 3 Feb 2012
10:44:32 -0800
Received: from susi.local (10.136.141.246) by eurhub01.eur.adobe.com
(10.128.4.111) with Microsoft SMTP Server id 8.3.192.1; Fri, 3 Feb 2012
18:44:31 +0000
Message-ID: <4F2C2B0E.7050508@apache.org>
Date: Fri, 3 Feb 2012 18:44:30 +0000
From: =?ISO-8859-1?Q?Michael_D=FCrig?=
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:10.0) Gecko/20120129 Thunderbird/10.0
MIME-Version: 1.0
To: "dev@jackrabbit.apache.org"
Subject: [jr3 microkernel] Change log consolidation
Content-Type: text/plain; charset="ISO-8859-1"; format=flowed
Content-Transfer-Encoding: 7bit
Hi,
While working on a new transient space implementation [1] I discovered
that recreating a list of changes from transient modifications to the
hierarchy is like opening a can of worms. Rethinking things it turned
out, that it is much simpler to directly consolidate the change log.
This approach is much more flexible and versatile since knowledge of the
persisted hierarchy is not necessary at all. Thus this approach should
work equally well with change logs for jr3, the Microkernel, Jackrabbit
core, JCR and SPI.
The algorithm is capable of creating minimal change logs from any valid
change log (that is from any change log that can be applied to some
hierarchy).
I extensively tested the algorithm on randomly created change logs [3].
This resulted in some interesting numbers: consolidation results in an
decrease of about 45% of these randomly generated change logs.
Furthermore the resulting size of the H2 database files *decreased up to
200%*!
The implementation [2] is currently part of the jackrabbit-microkernel
module in the sandbox. However it does not have any crucial dependencies
so it should be easy to adapt for other usages.
The gory details of the algorithm are based on some algebraic properties
of move operations on paths as detailed in the following paragraphs.
The change log consolidation algorithm implemented in the reduce method
[2] is based on algebraic properties of move operations on paths. The
other operations (add node, remove node and set property) are
generalized to move operations. Consolidation relies on reduction and
commutation rules of move operations.
A move operation resembles a map on a hierarchy (of nodes and
properties). A change log consisting of k move operations m_1 to m_k is
thus the composition of the individual moves: m_1 *, ..., * m_k (using *
for function composition: f(g(x)) = (f * g)(x)).
Some definitions, notation and propositions:
* Let NIL denote a path which never occurs in any hierarchy.
* Order on paths: let p, q be paths.
- p < q iff p != NIL and q != NIL and p is an ancestor of q.
- p <= q iff p < q or p == q != NIL
* Conflict of paths: let p, q be paths.
- p ~ q (p conflicts with q) iff either p <= q or q <= p
* Substitution in paths: let p, q, r be paths.
- [p -> q]r = r if p is not an ancestor of r and
- [p -> q]r = s where s is the path resulting from replacing
the ancestor p in r with q otherwise.
* Let p, q be paths. Then p:q denotes a move operation where the
node at p is moved to a new node at q.
* Valid moves: leq p, q be paths.
- p:q is valid iff p !~ q or p = q
- if p:q is not valid, it is invalid
Invalid moves are exactly those which would result in a node being moved
to an ancestor/descendant of itself.
* Identity on moves: let p, q be paths.
- p:q = ID iff p = q.
* Conflict on moves: let p, q, r, s be paths.
- p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
or q ~ r or q ~ s.
* Strict commutativity of moves: let m, n be moves.
- m * n = n * m iff m !~ n
* Substitutions in moves: let p, q, r, s be paths.
- [p -> q]r:s = [p -> q]r:[p -> q]s
* Let p be a path and let +p denote an add node operation and let -p
denote a remove node operation for a node at path p.
- +p = NIL:p That is, adding a node is represented by a move from a
unknown source.
- p = p:NIL. That is, removing a node is represented by a move to an
unknown sink.
Let m = p:q, n = r:s with p, q, r, s != NIL be valid moves with m != ID
and n != ID. Then the following reduction and commutation rules apply:
1. p!~ r: m * n = n * m
2. p < r: illegal (since this implies q <= r which implies p ~ q and
thus m invalid)
3. p = r: illegal (since this implies q <= r which implies p ~ q and
this m invalid)
4. p > r: does not commute if q < s. Otherwise m * n = n * [r -> s]m
5. p!~ s: m * n = n * m
6. p < s: illegal (since this implies p ~ q and thus m invalid)
7. p = s: does not commute
8. p > s: illegal (since p > s implies there is an s already which
will conflict with r:s)
9. q!~ r: m * n = n * m
10. q < r: m * n = [q -> p]n * m
11. q = r: m * n = p:s (transitivity of moves)
12. q > r: m * n = n * [r -> s]m
13. q!~ s: m * n = n * m
14. q < s: does not commute if p > r. Otherwise m * n = [q -> p]n * m
15. q = s: illegal (since s conflicts with r:s)
16. q > s: illegal (since s conflicts with r:s)
Allowing add node and remove node operations the following additional
conditions apply:
Let m = p:q, n = r:s be valid moves with m != ID and n != ID. Then the
reduction and commutations rules 1. to 16. apply with extra conditions
on 4., 10., 12. and 14.:
4'. if s = NIL and q = NIL then m * n = -r. Otherwise if s = NIL then
m, n do not commute.
10'. illegal if p = NIL
12'. if s = NIL then m * n = -r * -p
14'. illegal if p = NIL
Michael
[1] http://markmail.org/message/qkkcvtmtapas2cx4
[2]
http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=markup
[3]
http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/src/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=markup