jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Observation design (Was: svn commit: r1351414 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/ChangeSet.java api/ContentSession.java core/ContentSessionImpl.java)
Date Wed, 20 Jun 2012 11:26:43 GMT

On 20.6.12 11:35, Jukka Zitting wrote:
> Hi,
> On Tue, Jun 19, 2012 at 11:45 PM, Michael Dürig<mduerig@apache.org>  wrote:
>> But this is no different with polling not backed by the Microkernel journal:
>> when a client takes a long time to digest changes this might cause the next
>> poll to be deferred so much that the relevant revisions are not available
>> any more.
> That's still a problem, but a somewhat different one (see the earlier
> discussion about revision lifetimes and leases). The changeset
> approach requires that *all* revisions since the last observed one are
> still available, whereas the polling approach just requires *two*
> revisions to be available: the last observed one and the very latest
> one.

Yes but... with and without such a lease mechanism my approach is more 
general and doesn't hurt anything: if older revisions are available my 
approach generates a more fine grained set of events. If older revisions 
are not available any more it just gracefully degenerated to your 
approach. If in the extreme only two revisions (last observed and 
latest) are available it is the same as your approach.

> In a write-heavy deployment we could easily see hundreds of revisions
> per second. A system that wants to preserve the entire journal for
> even just an hour could face the need to keep track of something like
> a million revisions, potentially much more. I don't think that's a
> feasible approach at least as a general solution.
> There may well be deployments where we *do* want to keep detailed
> audit logs of everything anyone has done, but I'd rather handle that
> as an optional extension than a core part of the API.
>> There is a linear order of the events on each cluster node. The order is
>> just not the same for all of them. As I said, a cluster sync is just viewed
>> as changes applied by any other session. So its all in the journal.
> If you do view the cluster sync as a change applied by another
> session, then how do you handle user data and other event details from
> the potentially many changes that got applied by perhaps multiple
> different sessions on the other cluster node?

Just compare the states from before the sync and after the sync to 
calculate the events. I wouldn't forward any user data from remote 
sessions since I'd like to view cluster sync as changes applied by a 
"sync" session which look just like any other session to the user.

In other words: the observable results should be the same like if a user 
"sync" created a session and manually merged the differences of the 
cluster nodes.

> For example, consider the following scenario with cluster nodes A and B:
> A: set property P from X to Y at time 1 with user data K
> B: set property P from X to Y at time 2 with user data L
> B: set property P from Y to Z at time 3 with user data M
> B: set property Q with user data N
> When A syncs with B, the resulting property P would presumably be set
> to Z (the only sane way of merging such changes), but which events and
> what user data will an observer on A see? Will user data L ever be
> seen by an observer on A? Will M? If yes, what is the sequence of
> property P changes seen by an observer on A: X ->  Y ->  Z or X ->  Y, X
> ->  Y ->  Z?

The (imaginary) sync session for A would set P from Y to Z and set 
property Q. A would observe X -> Y (its own change) and Y -> Z, setting 
of Q (change by the sync session).


> BR,
> Jukka Zitting

View raw message