Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 20737 invoked from network); 21 Nov 2010 16:52:14 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 21 Nov 2010 16:52:14 -0000 Received: (qmail 15143 invoked by uid 500); 21 Nov 2010 16:52:46 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 15121 invoked by uid 500); 21 Nov 2010 16:52:46 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 15112 invoked by uid 99); 21 Nov 2010 16:52:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Nov 2010 16:52:45 +0000 X-ASF-Spam-Status: No, hits=3.6 required=10.0 tests=FS_REPLICA,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [195.238.135.132] (HELO ox.dig.de) (195.238.135.132) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 21 Nov 2010 16:52:39 +0000 Received: from [10.20.40.50] (unknown [10.20.40.50]) by ox.dig.de (Postfix) with ESMTP id C080921D4071 for ; Sun, 21 Nov 2010 17:52:15 +0100 (CET) Message-ID: <4CE94E3F.5020907@dig.de> Date: Sun, 21 Nov 2010 17:52:15 +0100 From: Markus Blaurock User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.15) Gecko/20101027 Thunderbird/3.0.10 MIME-Version: 1.0 To: users@jackrabbit.apache.org Subject: Cool feature: Cluster for replication? Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Hello all, while discussing the backup-strategies for our jackrabbit-repository the following "feature" came to our minds: What about having the cluster-journal doing replication of the peristent-storage? The goal that we want to reach is having a copy of our repository in a second database. At best it would have a little "delay", meaning that it reflects the data from e.g. an hour ago. If we could instruct a specialized cluster-node to share the same journal, but use a different persistent storage that would be the first step. The second step would be to tell the cluster-sync-thread to sync revisions from a given offset to the current global revision. I read the source code an found the method "doExternal" in the SharedItemState manager which is responsible for applying changes from the journal. Inside i found the line "state.copy(currentState, true)" which applies the full state from the journal to the actual state. Is this correct? In the documentation for clustering it says that every clusternode must have access to the same persistent storage, because the property-values are not included in the journal. How does this relate to "state.copy" ? The backup-repository would of course share the same datastore as the main repository, so the journal only needs to have the values from the persistent storage. Is this really a big performance issue as said in the docs? What else would have to be done besides adding functionality for item-creation in "doExternal" to implement the cluster for replication feature? Are there any obstacles we don't see? I think such a feature would be a big leap to high availability of a JCR-Repository because it avoids the time consuming index-(re)creation when restoring a backup. Just have a second replicated repository and your done! With the delay function for replication we could also avoid data-corruption issues... regards Markus