Return-Path: X-Original-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Delivered-To: apmail-jackrabbit-oak-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C36229CCB for ; Wed, 20 Jun 2012 11:27:14 +0000 (UTC) Received: (qmail 66975 invoked by uid 500); 20 Jun 2012 11:27:14 -0000 Delivered-To: apmail-jackrabbit-oak-dev-archive@jackrabbit.apache.org Received: (qmail 66936 invoked by uid 500); 20 Jun 2012 11:27:14 -0000 Mailing-List: contact oak-dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: oak-dev@jackrabbit.apache.org Delivered-To: mailing list oak-dev@jackrabbit.apache.org Received: (qmail 66921 invoked by uid 99); 20 Jun 2012 11:27:14 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jun 2012 11:27:14 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [64.18.1.33] (HELO exprod6og114.obsmtp.com) (64.18.1.33) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jun 2012 11:27:05 +0000 Received: from outbound-smtp-1.corp.adobe.com ([192.150.11.134]) by exprod6ob114.postini.com ([64.18.5.12]) with SMTP ID DSNKT+GzdLqhr7GN8gTouMo8Uwi0ZhyuIWwH@postini.com; Wed, 20 Jun 2012 04:26:45 PDT Received: from inner-relay-4.eur.adobe.com (inner-relay-4.adobe.com [193.104.215.14]) by outbound-smtp-1.corp.adobe.com (8.12.10/8.12.10) with ESMTP id q5KBOPJ0020109 for ; Wed, 20 Jun 2012 04:24:25 -0700 (PDT) Received: from nacas02.corp.adobe.com (nacas02.corp.adobe.com [10.8.189.100]) by inner-relay-4.eur.adobe.com (8.12.10/8.12.9) with ESMTP id q5KBQgYr017292 for ; Wed, 20 Jun 2012 04:26:42 -0700 (PDT) Received: from eurhub01.eur.adobe.com (10.128.4.30) by nacas02.corp.adobe.com (10.8.189.100) with Microsoft SMTP Server (TLS) id 8.3.192.1; Wed, 20 Jun 2012 04:26:42 -0700 Received: from susi.local (10.136.132.158) by eurhub01.eur.adobe.com (10.128.4.111) with Microsoft SMTP Server id 8.3.192.1; Wed, 20 Jun 2012 12:26:40 +0100 Message-ID: <4FE1B373.2060308@apache.org> Date: Wed, 20 Jun 2012 12:26:43 +0100 From: =?ISO-8859-1?Q?Michael_D=FCrig?= User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Subject: Re: Observation design (Was: svn commit: r1351414 - in /jackrabbit/oak/trunk/oak-core/src/main/java/org/apache/jackrabbit/oak: api/ChangeSet.java api/ContentSession.java core/ContentSessionImpl.java) References: <4FDF7B57.5090508@apache.org> <4FE08275.9050906@apache.org> <4FE0F30D.2070200@apache.org> In-Reply-To: Content-Type: text/plain; charset="ISO-8859-1"; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org On 20.6.12 11:35, Jukka Zitting wrote: > Hi, > > On Tue, Jun 19, 2012 at 11:45 PM, Michael D�rig wrote: >> But this is no different with polling not backed by the Microkernel journal: >> when a client takes a long time to digest changes this might cause the next >> poll to be deferred so much that the relevant revisions are not available >> any more. > > That's still a problem, but a somewhat different one (see the earlier > discussion about revision lifetimes and leases). The changeset > approach requires that *all* revisions since the last observed one are > still available, whereas the polling approach just requires *two* > revisions to be available: the last observed one and the very latest > one. Yes but... with and without such a lease mechanism my approach is more general and doesn't hurt anything: if older revisions are available my approach generates a more fine grained set of events. If older revisions are not available any more it just gracefully degenerated to your approach. If in the extreme only two revisions (last observed and latest) are available it is the same as your approach. > > In a write-heavy deployment we could easily see hundreds of revisions > per second. A system that wants to preserve the entire journal for > even just an hour could face the need to keep track of something like > a million revisions, potentially much more. I don't think that's a > feasible approach at least as a general solution. > > There may well be deployments where we *do* want to keep detailed > audit logs of everything anyone has done, but I'd rather handle that > as an optional extension than a core part of the API. > >> There is a linear order of the events on each cluster node. The order is >> just not the same for all of them. As I said, a cluster sync is just viewed >> as changes applied by any other session. So its all in the journal. > > If you do view the cluster sync as a change applied by another > session, then how do you handle user data and other event details from > the potentially many changes that got applied by perhaps multiple > different sessions on the other cluster node? Just compare the states from before the sync and after the sync to calculate the events. I wouldn't forward any user data from remote sessions since I'd like to view cluster sync as changes applied by a "sync" session which look just like any other session to the user. In other words: the observable results should be the same like if a user "sync" created a session and manually merged the differences of the cluster nodes. > > For example, consider the following scenario with cluster nodes A and B: > > A: set property P from X to Y at time 1 with user data K > B: set property P from X to Y at time 2 with user data L > B: set property P from Y to Z at time 3 with user data M > B: set property Q with user data N > > When A syncs with B, the resulting property P would presumably be set > to Z (the only sane way of merging such changes), but which events and > what user data will an observer on A see? Will user data L ever be > seen by an observer on A? Will M? If yes, what is the sequence of > property P changes seen by an observer on A: X -> Y -> Z or X -> Y, X > -> Y -> Z? The (imaginary) sync session for A would set P from Y to Z and set property Q. A would observe X -> Y (its own change) and Y -> Z, setting of Q (change by the sync session). Michael > > BR, > > Jukka Zitting