jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Revision cleanup
Date Tue, 09 Feb 2010 09:29:02 GMT

On 8 Feb 2010, at 20:26, Michael Yin wrote:

> Our jackrabbit 1.4.x db has 80000 revisions. If we don't care about
> version history, but also want to add new 'cluster nodes' at any point
> but don't want to sit waiting for jackrabbit to process 80,000
> revisions, is there any way to reset the revision counter to speed that
> up? Currently we tend to copy around local repo folder, but that is just
> asking for corruption.

We have been running in production for about 18 months in a 8 node cluster with JR1.4. Our
app servers are hosted on Xen VM's and we drop and recreate them to adjust for load. Here
is what we do.

1. We rsync backup the local repo onto a shares server, performing sequential rsyncs untill
we get no modifications in the state of the files on disk from beginning to end.
1a once we have a stable copy we tar that up and send to a central backup server as a "snapshot"
of the local node.
2 To determine is the snapshot is stable, we read the local revisions file from the local
repo and compare it to the state in the central DB. If they are the same we know nothing was
pending in the local state, so if there are no rsync changes the snapshot is stable and in
sync with the DB.
3 We store all the local revisions number of all the snapshots in one place, and periodically
clean the revision history in the DB upto the lowest revision number.

on creation of a VM to join the cluster. 
We find the latest snapshot from any node
Unpack the snapshot
Modify local settings (server ID etc)
Bring the node up, at which point it catches up with the rest of the cluster, usually a delay
of < 1min

This was all implemented as perl scripts and as I say has been good for about a 18 months.
The nice part is at any one time we have about 8 good snapshots, so if for any reason 1 is
bad, there are 7 more to try.

The critical part is to get the snapshot stable before taking it, unfortunately there is no
way of pausing JR to allow this to happen, although we could have put something into the ClusterNode
implementation to trigger a snapshot. I suspect under really heavy load this would not work.


> I was thinking about exporting to XML then reimporting into a clean
> repo, but there must be a better way than that. 
> -mike

View raw message