jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jukka Zitting <jukka.zitt...@gmail.com>
Subject Re: Observation design (Was: svn commit: r1351414 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/ChangeSet.java api/ContentSession.java core/ContentSessionImpl.java)
Date Wed, 20 Jun 2012 10:35:25 GMT
Hi,

On Tue, Jun 19, 2012 at 11:45 PM, Michael Dürig <mduerig@apache.org> wrote:
> But this is no different with polling not backed by the Microkernel journal:
> when a client takes a long time to digest changes this might cause the next
> poll to be deferred so much that the relevant revisions are not available
> any more.

That's still a problem, but a somewhat different one (see the earlier
discussion about revision lifetimes and leases). The changeset
approach requires that *all* revisions since the last observed one are
still available, whereas the polling approach just requires *two*
revisions to be available: the last observed one and the very latest
one.

In a write-heavy deployment we could easily see hundreds of revisions
per second. A system that wants to preserve the entire journal for
even just an hour could face the need to keep track of something like
a million revisions, potentially much more. I don't think that's a
feasible approach at least as a general solution.

There may well be deployments where we *do* want to keep detailed
audit logs of everything anyone has done, but I'd rather handle that
as an optional extension than a core part of the API.

> There is a linear order of the events on each cluster node. The order is
> just not the same for all of them. As I said, a cluster sync is just viewed
> as changes applied by any other session. So its all in the journal.

If you do view the cluster sync as a change applied by another
session, then how do you handle user data and other event details from
the potentially many changes that got applied by perhaps multiple
different sessions on the other cluster node?

For example, consider the following scenario with cluster nodes A and B:

A: set property P from X to Y at time 1 with user data K
B: set property P from X to Y at time 2 with user data L
B: set property P from Y to Z at time 3 with user data M
B: set property Q with user data N

When A syncs with B, the resulting property P would presumably be set
to Z (the only sane way of merging such changes), but which events and
what user data will an observer on A see? Will user data L ever be
seen by an observer on A? Will M? If yes, what is the sequence of
property P changes seen by an observer on A: X -> Y -> Z or X -> Y, X
-> Y -> Z?

BR,

Jukka Zitting

Mime
View raw message