From Christian Stocker <christian.stoc...@liip.ch>
Subject Add more options to make Jackrabbit more failsafe and/or scale-out
Date Mon, 02 May 2011 11:39:37 GMT
Hi all

My favourite topic again. Building a fail-safe and/or scalable
jackrabbit setup.

We had the wish to make our setup datacenter-fail resistant. eg. if one
DC goes down, we can still serve pages from a backup jackrabbit
instance. We use MySQL as  perstistant store, this is no given, but I
guess the problems are everywhere the same.

With a traditional setup, if the main DC goes down, your Store goes down
with it and the jackrabbit instance in the other DC can't access it
anymore as well. That's why we thought about replicating the MySQL DB to
the 2nd DC and just read from there (we can make sure that nothing
writes to the backup jackrabbit instance). This works fine. As we can
already point the cluster journal "store" to another place than the PM,
we just point the journal store to the central one in the 1st DC and
read the data from the PM in the MySQL slave in the 2nd DC. A read-only
jackrabbit only has to write to the journal table and nowhere else
AFAIK, so that works well even with replicating MySQLs.

All fine and good and even if the master MySQL goes down the Jackrabbit
instance in the 2nd DC serves its nodes as nothing happened.

The one problem which there is left is that there's a replication lag
between the master and the slave MySQL (there's one, even if the sit
just besides each other). What can happen with this is that a writing
jackrabbit writes a new node and the journal entry and then the backup
jackrabbit reads from the journal (from the mysql master) but the actual
content didn't end up in the mysql slave (where the backup jackrabbit
reads its PM data from). This can easily be tested with stopping the
mysql replication.

The solution I came up with was to read the journal entries also from
the MySQL slave (but still write the LOCAL_REVISION to the master). With
this we can make sure the jackrabbit in the 2nd DC only reads entries,
which are already in its mysql slave. A patch which makes this work is here


The only thing I had to change was to read the "selectRevisionsStmtSQL"
from the slave instead of the master, the rest can still go to the master.

What do you think of this approach? Would this be worth adding to
jackrabbit? Any input for the patch what I could improve?

Besides the fail-over scenario you also can easily do scaling with that
approach, so you can serve your "read-only" webpages from a totally
differnt DC without having too much traffic between the DCs (it's
basically just the MySQL replication traffic). That's why I didn't want
to read from the Master in the backup jackrabbit and only switch to the
replicating slave, when things fail (which would be a solution, too, of

any input is appreciated


