jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Boston <...@tfd.co.uk>
Subject Re: Adding new nodes to a cluster
Date Tue, 14 Sep 2010 09:48:16 GMT

On 14 Sep 2010, at 19:27, Vidar Ramdal wrote:

> We're setting up a clustered Jackrabbit application.
> The application has hight traffic, so we're concerned that the Journal
> table will be very large. This, in turn, will make setting up new
> nodes a time-consuming task, when the new node starts replaying the
> journal to get up to date.
> 
> At [1], the concept of the janitor is described, which cleans the
> journal table at certain intervals. However, the list of caveats
> states that "If the janitor is enabled then you loose the possibility
> to easily add cluster nodes. (It is still possible but takes detailed
> knowledge of Jackrabbit.)"
> 
> What detailed knowledge does this take? Can anyone give me some hints
> of what we need to look into?

Sure,
you need to take a good copy of the local state of a node and for good measure extract the
journal log number for that state from the journal revision file. (have a look at the ClusterNode
impl to locate it and the format IIRC its 2 binary longs)

Getting a good local state means one that is consistent with itself and wasn't written to
between the start and the end of the snapshot operation. If you have high write traffic you
almost certainly wont be able to snapshot the local state live, and will have to take a node
offline before taking a snapshot. If its high read low write you might be able to use repetitive
rsync operations to get a good snapshot.



> 
> Also, we're not 100% sure we know what happens when a new node is
> added.

If there is no snapshot to start form, it will replay all journal records since record 0 to
build the search index and anything else. If there is a snapshot it will read the record number
of the snapshot and replay from that point forwards.

Before using a snapshot you must make certain that all references to the server name are correct
in the snapshot (look in repository.xml after startup)


> We understand that the journal needs to be replayed so that the
> Lucene index kan be updated. But is the Lucene index the only thing
> that needs modification when a new node is started?

AFAIK yes, 

> If so, should this procedure work:
> 1. Take a complete snapshot (disk image) of one of the live nodes -
> including the Lucene index
> 2. Use the disk image to setup a new node
> 4. Assign a new, uniqe cluster node ID to the new node

yes (didnt need to write all that I did above :) )

> 
> However, when trying this procedure, we still experienced that the new
> node replayed the entire journal.

hmm, did you get the local journal record number with the snapshot ?

> 
> Is there more we need to add to the procedure, so that we can add new
> nodes without having to replay all (perhaps a year's worth of) journal
> entries?

AFAIK, no, 
we have been running in production with a 8 node cluster for about 2 years and have regularly
brought new nodes in and out... however... the journal does effectively sync all nodes into
a single threaded operation, so only use it for reliability or if you have other processing
that needs to be distributed. You will not get parallel speedup of the jackrabbit part.

HTH
Ian


> 
> [1] http://wiki.apache.org/jackrabbit/Clustering#Removing_Old_Revisions
> 
> -- 
> Vidar S. Ramdal <vidar@idium.no> - http://www.idium.no
> Sommerrogata 13-15, N-0255 Oslo, Norway
> + 47 22 00 84 00 / +47 22 00 84 76
> Quando omni flunkus moritatus!


Mime
View raw message