jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alex Parvulescu (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (JCR-3162) Index update overhead on cluster slave due to JCR-905
Date Tue, 06 Dec 2011 12:29:40 GMT

     [ https://issues.apache.org/jira/browse/JCR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alex Parvulescu updated JCR-3162:

    Attachment: JCR-3162-v3.patch

V3 comes with a complete redesign of the patch.

After further analysis we've decided to go with inspecting the incoming journal changes in
the case of an initial index re-build.

I'll try to clarify. The scope of JCR-905 fix should *only* be for an initial index build.

The initial indexing operation can cause doubles to appear, as some nodes can be seen by a
slave before the ADD event has reached it. This happens because of shared storage between
cluster nodes.
So, when a slave starts to re-index the repository content, it will include *everything* (potentially
also nodes that is hasn't received a ADD event for yet). 
When the index finishes, the repository will continue its startup. A bit later, the cluster
component will also initialize and consequently sync. This will pull in the ADD events that
were pending in a newer revision, on the master.

The V3 tries to poll the changes before the cluster.sync call, and preemptively generate DELETE
events for all the ADD events that it finds on the current workspace.
(this is similar to the JCR-905 patch, but with a much smaller scope).

Another feature introduced in the patch is to force flush the index after the initial index
has been created.
This was artificially done in the original test case (no unit test though) by:
> However, when I debug clusternode 2 and have a breakpoint (i.e., a pause of a few seconds
at line 306 of RepositoryImpl.java - just before the clusternode is started), then the resultset
contains two results, both with the same UUID.

So forcing the index flush will correctly reproduce the original problem. And I think should
be the correct behaviour of the original index creation.
On the other hand, not flushing the index will hide the problem because the indexing queue
is smart enough to remove doubles.

But, flushing the index basically invalidates JCR-905, which is a bit unexpected (see attached
patch, by switching the feature flags off).

On the code itself: I guess the AbstractJournal could use a bit of refactoring on the event
polling side.

> Index update overhead on cluster slave due to JCR-905
> -----------------------------------------------------
>                 Key: JCR-3162
>                 URL: https://issues.apache.org/jira/browse/JCR-3162
>             Project: Jackrabbit Content Repository
>          Issue Type: Improvement
>          Components: clustering
>            Reporter: Alex Parvulescu
>            Priority: Minor
>         Attachments: JCR-3162-v2.patch, JCR-3162-v3.patch, JCR-3162.patch
> JCR-905 is a quick and dirty fix and causes overhead on a cluster slave node when it
processes revisions.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message