Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm
Precedence: bulk
Reply-To: users@jackrabbit.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4CE94E3F.5020907@dig.de>
Date: Sun, 21 Nov 2010 17:52:15 +0100
From: Markus Blaurock <blaurock@dig.de>
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US;
 rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10
MIME-Version: 1.0
To: users@jackrabbit.apache.org
Subject: Cool feature: Cluster for replication?
Content-Type: text/plain; charset=ISO-8859-15
Content-Transfer-Encoding: 7bit

Hello all,

while discussing the backup-strategies for our jackrabbit-repository the
following "feature" came to our minds:

What about having the cluster-journal doing replication of the
peristent-storage?

The goal that we want to reach is having a copy of our repository in a
second database. At best it would have a little "delay", meaning that it
reflects the data from e.g. an hour ago.

If we could instruct a specialized cluster-node to share the same
journal, but use a different persistent storage that would be the first
step.
The second step would be to tell the cluster-sync-thread to sync
revisions from a given offset to the current global revision.

I read the source code an found the method "doExternal" in the
SharedItemState manager which is responsible for applying changes from
the journal.
Inside i found the line "state.copy(currentState, true)" which applies
the full state from the journal to the actual state. Is this correct?
In the documentation for clustering it says that every clusternode must
have access to the same persistent storage, because the property-values
are not included in the journal. How does this relate to "state.copy" ?
The backup-repository would of course share the same datastore as the
main repository, so the journal only needs to have the values from the
persistent storage. Is this really a big performance issue as said in
the docs?

What else would have to be done besides adding functionality for
item-creation in "doExternal" to implement the cluster for replication
feature? Are there any obstacles we don't see?

I think such a feature would be a big leap to high availability of a
JCR-Repository because it avoids the time consuming index-(re)creation
when restoring a backup. Just have a second replicated repository and
your done! With the delay function for replication we could also avoid
data-corruption issues...

regards
Markus