From dev-return-34043-apmail-jackrabbit-dev-archive=jackrabbit.apache.org@jackrabbit.apache.org Sat Feb 4 20:08:16 2012
Return-Path:
X-Original-To: apmail-jackrabbit-dev-archive@www.apache.org
Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
by minotaur.apache.org (Postfix) with SMTP id 73B069D02
for ; Sat, 4 Feb 2012 20:08:16 +0000 (UTC)
Received: (qmail 43673 invoked by uid 500); 4 Feb 2012 20:08:16 -0000
Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org
Received: (qmail 43313 invoked by uid 500); 4 Feb 2012 20:08:15 -0000
Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
List-Help:
List-Unsubscribe:
List-Post:
List-Id:
Reply-To: dev@jackrabbit.apache.org
Delivered-To: mailing list dev@jackrabbit.apache.org
Received: (qmail 43301 invoked by uid 99); 4 Feb 2012 20:08:15 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Feb 2012 20:08:15 +0000
X-ASF-Spam-Status: No, hits=-0.7 required=5.0
tests=RCVD_IN_DNSWL_LOW,SPF_PASS
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of stefan.guggisberg@gmail.com designates 209.85.210.170 as permitted sender)
Received: from [209.85.210.170] (HELO mail-iy0-f170.google.com) (209.85.210.170)
by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Feb 2012 20:08:09 +0000
Received: by iakk32 with SMTP id k32so13280729iak.1
for ; Sat, 04 Feb 2012 12:07:49 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=mime-version:in-reply-to:references:date:message-id:subject:from:to
:content-type:content-transfer-encoding;
bh=Z6ETx4iUAgN5c9sZQk7z1y8jHvZ61lx14BNHZP7LyeI=;
b=ghP6aS2leqfjA/du2bmg8OZu8L4QpQtlVBU7xwPH+yE2atdC03PNQngRK6lqVwc2Cj
cRZDAxFodOtNEDiHjm45qKtYO1T8wifQDj0Ss77V7XNuSmPFhxEXc9RiM09aMGn85shm
691kbn9uwsonvb/NSLyC9WWBTqK3sUZ9feOdk=
MIME-Version: 1.0
Received: by 10.42.152.65 with SMTP id h1mr11337439icw.50.1328386069373; Sat,
04 Feb 2012 12:07:49 -0800 (PST)
Received: by 10.42.147.197 with HTTP; Sat, 4 Feb 2012 12:07:49 -0800 (PST)
In-Reply-To: <4F2C2B0E.7050508@apache.org>
References: <4F2C2B0E.7050508@apache.org>
Date: Sat, 4 Feb 2012 21:07:49 +0100
Message-ID:
Subject: Re: [jr3 microkernel] Change log consolidation
From: Stefan Guggisberg
To: dev@jackrabbit.apache.org
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
On Fri, Feb 3, 2012 at 7:44 PM, Michael D=FCrig wrote:
>
> Hi,
>
> While working on a new transient space implementation [1] I discovered th=
at
> recreating a list of changes from transient modifications to the hierarch=
y
> is like opening a can of worms. Rethinking things it turned out, that it =
is
> much simpler to directly consolidate the change log. This approach is muc=
h
> more flexible and versatile since knowledge of the persisted hierarchy is
> not necessary at all. Thus this approach should work equally well with
> change logs for jr3, the Microkernel, Jackrabbit core, JCR and SPI.
>
> The algorithm is capable of creating minimal change logs from any valid
> change log (that is from any change log that can be applied to some
> hierarchy).
>
> I extensively tested the algorithm on randomly created change logs [3]. T=
his
> resulted in some interesting numbers: consolidation results in an decreas=
e
> of about 45% of these randomly generated change logs. Furthermore the
> resulting size of the H2 database files *decreased up to 200%*!
>
> The implementation [2] is currently part of the jackrabbit-microkernel
> module in the sandbox. However it does not have any crucial dependencies =
so
> it should be easy to adapt for other usages.
>
> The gory details of the algorithm are based on some algebraic properties =
of
> move operations on paths as detailed in the following paragraphs.
>
> The change log consolidation algorithm implemented in the reduce method [=
2]
> is based on algebraic properties of move operations on paths. The other
> operations (add node, remove node and set property) are generalized to mo=
ve
> operations. Consolidation relies on reduction and commutation rules of mo=
ve
> operations.
>
> A move operation resembles a map on a hierarchy (of nodes and properties)=
. A
> change log consisting of k move operations m_1 to m_k is thus the
> composition of the individual moves: m_1 *, ..., * m_k (using * for funct=
ion
> composition: f(g(x)) =3D (f * g)(x)).
>
> Some definitions, notation and propositions:
>
> * Let NIL denote a path which never occurs in any hierarchy.
>
> * Order on paths: let p, q be paths.
> =A0- p < q iff p !=3D NIL and q !=3D NIL and p is an ancestor of q.
> =A0- p <=3D q iff p < q or p =3D=3D q !=3D NIL
>
> * Conflict of paths: let p, q be paths.
> =A0- p ~ q (p conflicts with q) iff either p <=3D q or q <=3D p
>
> * Substitution in paths: let p, q, r be paths.
> =A0- [p -> q]r =3D r if p is not an ancestor of r and
> =A0- [p -> q]r =3D s where s is the path resulting from replacing
> =A0 =A0the ancestor p in r with q otherwise.
>
> * Let p, q be paths. Then p:q denotes a move operation where the
> =A0node at p is moved to a new node at q.
>
> * Valid moves: leq p, q be paths.
> =A0- p:q is valid iff p !~ q or p =3D q
> =A0- if p:q is not valid, it is invalid
>
> Invalid moves are exactly those which would result in a node being moved =
to
> an ancestor/descendant of itself.
>
> * Identity on moves: let p, q be paths.
> =A0- p:q =3D ID iff p =3D q.
>
> * Conflict on moves: let p, q, r, s be paths.
> =A0- p:q ~ r:s (p:q conflicts with r:s) iff either p ~ r or p ~ s
> =A0 =A0or q ~ r or q ~ s.
>
> * Strict commutativity of moves: let m, n be moves.
> =A0- m * n =3D n * m iff m !~ n
>
> * Substitutions in moves: let p, q, r, s be paths.
> =A0- [p -> q]r:s =3D [p -> q]r:[p -> q]s
>
> * Let p be a path and let +p denote an add node operation and let -p
> =A0denote a remove node operation for a node at path p.
> =A0- +p =3D NIL:p That is, adding a node is represented by a move from a
> =A0 =A0unknown source.
> =A0- p =3D p:NIL. That is, removing a node is represented by a move to an
> =A0 =A0unknown sink.
>
>
> Let m =3D p:q, n =3D r:s with p, q, r, s !=3D NIL be valid moves with m !=
=3D ID and
> n !=3D ID. Then the following reduction and commutation rules apply:
>
> 1. =A0p!~ r: =A0m * n =3D n * m
> 2. =A0p < r: =A0illegal (since this implies q <=3D r which implies p ~ q =
and
> =A0 =A0 =A0 =A0 =A0 =A0thus m invalid)
> 3. =A0p =3D r: =A0illegal (since this implies q <=3D r which implies p ~ =
q and
> =A0 =A0 =A0 =A0 =A0 =A0this m invalid)
> 4. =A0p > r: =A0does not commute if q < s. Otherwise m * n =3D n * [r -> =
s]m
> 5. =A0p!~ s: =A0m * n =3D n * m
> 6. =A0p < s: =A0illegal (since this implies p ~ q and thus m invalid)
> 7. =A0p =3D s: =A0does not commute
> 8. =A0p > s: =A0illegal (since p > s implies there is an s already which
> =A0 =A0 =A0 =A0 =A0 =A0will conflict with r:s)
> 9. =A0q!~ r: =A0m * n =3D n * m
> 10. q < r: =A0m * n =3D [q -> p]n * m
> 11. q =3D r: =A0m * n =3D p:s (transitivity of moves)
> 12. q > r: =A0m * n =3D n * [r -> s]m
> 13. q!~ s: =A0m * n =3D n * m
> 14. q < s: =A0does not commute if p > r. Otherwise m * n =3D [q -> p]n * =
m
> 15. q =3D s: =A0illegal (since s conflicts with r:s)
> 16. q > s: =A0illegal (since s conflicts with r:s)
>
> Allowing add node and remove node operations the following additional
> conditions apply:
>
> Let m =3D p:q, n =3D r:s be valid moves with m !=3D ID and n !=3D ID. The=
n the
> reduction and commutations rules 1. to 16. apply with extra conditions on
> 4., 10., 12. and 14.:
>
> 4'. =A0if s =3D NIL and q =3D NIL then m * n =3D -r. Otherwise if s =3D N=
IL then
> =A0 =A0 m, n do not commute.
> 10'. illegal if p =3D NIL
> 12'. if s =3D NIL then m * n =3D -r * -p
> 14'. illegal if p =3D NIL
sounds very complicated ;) i admit that i don't
fully understand it (ok, i didn't try very hard
and gave up somewhere in the middle).
could you please provide a simple example?
what is the benefit of this approach compared
to the imlementation in jackrabbit-core
(which does consolidate transient changes)?
cheers
stefan
>
> Michael
>
>
> [1] http://markmail.org/message/qkkcvtmtapas2cx4
> [2]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/sr=
c/main/java/org/apache/jackrabbit/state/ChangeLog.java?view=3Dmarkup
> [3]
> http://svn.apache.org/viewvc/jackrabbit/sandbox/jackrabbit-microkernel/sr=
c/test/java/org/apache/jackrabbit/state/ChangeLogFuzzTest.java?view=3Dmarku=
p