jackrabbit-oak-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Dürig <mdue...@apache.org>
Subject Re: Observation
Date Fri, 08 Nov 2013 10:28:23 GMT

Hi Alex,

Thanks for the detailed analysis. I generally agree. See inline for some 
further comments.

On 7.11.13 2:49 , Alexander Klimetschek wrote:
> 1) listeners should not get external events by default
> OAK-1121 helps, but applications must say they exclude external
> events, while it should be the other way around (only local by
> default) - this can arguably not change because of the jcr
> observation contract. The current implementation can still be
> optimized a bit, see my comment in the issue.

Ack. I think we shouldn't/can't change this ATM due to the API contract 
and backwards compatibility. Rather should applications facing issues 
with being flooded with observation events redesign their use of 
observation in a way that scales with the cluster deployment. Oak should 
of course provide as much as possible to ease this process (e.g. 
OAK-1133, OAK-1121).

> 2) filtering happens too late within listeners because jcr
> observation API is too simple (OAK-1133, "observation listener
> plus")
> The filter logic would be passed declaratively and allow more options
> as noted in OAK-1133:
> addEventListener(listener, Filter.Path("/libs"),
> Filter.PropertyValue("jcr:content/sling:resourceType", "myValue"),
> ...)
> (just a rough sketch; the simplest solution could simply introduce a
> FilterEventListener interface that extends the existing EventListener
> and provides the filter in a getFilter() method; a simple instanceof
> check within oak-jcr could detect that; but a completely separate API
> might also make sense, see point 6 below)

This wouldn't work for journalled observation (OAK-1145) where we'd also 
need this filtering capabilities. There isn't an event listener involved 
there where to attach such a filter. So we probably need that new API 
where some kind of a filter object is passed directly.

> This could then be evaluated when the NodeStateDiff is calculated.
> What is good now already is that the base path (if specified in the
> jcr observation) is evaluated as soon as getting the diff roots. But
> many real use cases might want to have multiple paths (/apps OR
> /libs) or globbing (/content/site/*/jcr:content) - this could be
> easily and efficiently added here in one place (amongst those other
> filters). Having one or few base paths is in the oak case probably
> the best way to filter things and one should avoid listeners that
> register at root, and only filter on node types for example, which
> requires a lot more tree walking.

On the implementation side we will need a more general, flexible and 
composeable filter mechanism than we currently have. Due to the low 
level nature of these filters (working on the node state diff), these 
filters will most likely be somewhat awkward to use directly. On the API 
side we should therefore provide a convenient way to compose and inject 
such filters. Let's use OAK-1133 to follow up on this.

> In our app (CQ) we have 18 listeners that register for all events,
> because the JCR observation doesn't give them the right filter
> options, so they have to do it themselves. I expect them to be vastly
> improved with this new approach.
> Also, getting events for deleted nodes could be supported here, since
> there is full access to the old NodeState.

This somewhat interact with what we've done on OAK-803: the listening 
session gets refreshed before observation events are delivered. This was 
added to make observation more backward compatible. If we add a way to 
explicitly disable this refreshing the listening session will have 
access to deleted nodes. Added nodes will then not be visible until an 
explicit refresh is done.

> The ideal solution could also be built backwards compatible:
> implement the same filtering with normal JCR API usage and
> automatically wrap the listener in another event listener that
> filters the "manual" way. This allows application code that works
> with both Jackrabbit 2 or Oak with minimal effort (would need some
> helper in jackrabbit-jcr-commons or so). Unless we want to backport
> this to Jackrabbit 2 as well.
> 3) cluster events
> As agreed, external events can be the real scalability issue. It was
> also noted, that many cases do not need it at all (hence 1). But
> there are still cases that need it: in Sling for example, code
> deployment (bundles, JSP scripts, etc.) is based on the JCR
> repository, meaning when you put a bundle in a certain "install"
> folder in the JCR, all cluster instances need to pick it up. However,
> such deployment events are comparably  rare (except on a developer
> machine :)), so the throughput doesn't have to be high. OTOH
> large-scale content changes will (and should) be handled locally,
> thus not requiring external propagation.

I think there are three scenarios for applications:
1. Only rely on cluster local events if possible,
2. otherwise apply a sufficiently specific filter to not to get swamped,
3. come up with a custom solution (i.e. MQ based on an Observer).

> Now the question is if 2) with a
> filter-as-detailed-and-early-as-possible approach solves this already
> (having the right external/non-external flag on all listeners, plus
> detailed filters for the external listeners), or if we could make use
> of the fairly rare nature of those events and try to filter on the
> _source_ instance already and propagate those few events in some
> other, efficient way to other cluster instances. This might improve
> performance as the ChangeDispatcher#externalChange() check (polling
> all 100ms) could then be removed completely. But if the filtering on
> the NodeStateDiff is fast (because listeners are specific), then
> maybe it is ok.
> 4) listener threads
> Currently in oak each event listener gets its own thread. In our app
> CQ we have about 150 listeners, so you end up introducing 150
> threads. I am not sure if this is a good idea.
> Jackrabbit 2 had a single thread, which had the issue of one
> observation listener blocking all others.
> An obvious solution would be a thread pool (probably configurable
> number of threads). And on top of that, if the pool is full and no
> thread is free anymore, one could simply kill blocked handlers after
> a certain timeout (and optionally blacklist them).

Separate threads where introduced with OAK-1113 to expedite event 
delivery, which also to increased compatibility with Sling and solved 
I think we can further improve this by introducing a thread pool. When 
integrating this with the Whiteboard we would put the deployment into 
control on weighting observation throughput against resource usage. I 
wouldn't go so far and kill blocked thread for the same reason Jukka 
mentioned in his reply. However giving the deployment control over the 
thread pool makes it possible to use an unbounded pool that starts out 
with a few threads. Blocking handlers would then lead to higher resource 
usage while events are still being delivered to non blocked listeners.

> 5) filtering in central thread?
> There was some discussion about the current filtering evaluation [0].
> Each listener has its own ChangeProcessor which in turn has its own
> ChangeDispatcher.Listener. This means filtering is not in a single,
> central thread, but happens in each listener's thread (they each have
> their own ChangeSet queue, which indeed might be different because
> the previous root NodeState to compare to might be different for
> each).
> I thought this was not optimal but it seems I was wrong - splitting
> that up and generating events in a central thread and then passing
> them in a queue to the listeners actually turned out to scale worse
> (in a quick test with many parallel listeners). See my patch at [1].
> I guess it is that reading asynchronously from the immutable
> NodeStates is more efficient than multiple blocking queues. Which
> speaks good for the underlying oak implementation :)
> [0] http://markmail.org/thread/533orsfr44wllvrx [1]
> https://github.com/alexkli/jackrabbit-oak/commit/aee631ded4996194a3ad0dec1fc7a9917f7123b8

Good to have someone comment who actually got his hands dirty before 
being smart ;-)

>  6) long-running session just for observation
> One problem with JCR observation is that you need one session open
> all the time "just for the observation". This is done to be able to
> run observation under the permissions of a certain user. But nobody
> in practice uses the session for anything else than reading data upon
> events (which mostly is done to filter only, see 2); when you need to
> write things in an event, best practice is to create a new session on
> demand an close it again. With Oak's refresh() policy this is even
> more important.
> Maybe a new API from 2) could skip the requirement of an open
> session. Registration would be based upon a session to handle the
> credentials easily, but that session could be closed without killing
> the listener. You would get back a special listener object that would
> need to be kept alive and referenced to be able to clean up listeners
> automatically on a finalize() if they forget to unregister.
> Or you keep the session, but mark it in a special way so that Oak can
> internally save resources. Or maybe the session in oak is already so
> light-weight that this is actually no problem at all. I am just
> mentioning this because of the many warnings I get from oak in the
> logs about sessions that are open very long (and don't call refresh()
> IIRC).

I wouldn't care about the extra sessions. These should be cheap. And if 
they aren't we should fix that. Re. the warning: this should only be 
logged once per session and is meant as help for migrating to Oak. We 
can try to further lower the noise if required or remove it entirely.

> 7) move code to oak-jcr (minor)
> Quite a bit of the code in oak.plugins.observation is currently JCR
> eventing specific (EventGenerator for example), afaics this is better
> suited in oak-jcr. Although with 2) it might change a bit anyway.

Many parts just recently moved to oak-core (OAK-950). After untangling 
the dependencies (OAK-1143, ongoing) it might be possible to move the 
respective parts back to oak-jcr.


> Cheers, Alex

View raw message