jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christian Stocker <christian.stoc...@liip.ch>
Subject Re: Add more options to make Jackrabbit more failsafe and/or scale-out
Date Wed, 04 May 2011 13:25:24 GMT
Hi All

I made a broader blog post to the topic at

http://blog.liip.ch/archive/2011/05/04/how-to-make-jackrabbit-globally-distributable-fail-safe-and-scalable-in-one-go.html

How should I proceed if I'd like to get that into Jackrabbit? Just open
a ticket, add the patch and hopefully someone takes it from there? Or do
you think it doesn't have a chance to get into the jackrabbit-core?

greetings

chregu

On 02.05.11 14:48, Christian Stocker wrote:
> 
> 
> On 02.05.11 14:43, Bart van der Schans wrote:
>> On Mon, May 2, 2011 at 1:39 PM, Christian Stocker
>> <christian.stocker@liip.ch> wrote:
>>> Hi all
>>>
>>> My favourite topic again. Building a fail-safe and/or scalable
>>> jackrabbit setup.
>>>
>>> We had the wish to make our setup datacenter-fail resistant. eg. if one
>>> DC goes down, we can still serve pages from a backup jackrabbit
>>> instance. We use MySQL as  perstistant store, this is no given, but I
>>> guess the problems are everywhere the same.
>>>
>>> With a traditional setup, if the main DC goes down, your Store goes down
>>> with it and the jackrabbit instance in the other DC can't access it
>>> anymore as well. That's why we thought about replicating the MySQL DB to
>>> the 2nd DC and just read from there (we can make sure that nothing
>>> writes to the backup jackrabbit instance). This works fine. As we can
>>> already point the cluster journal "store" to another place than the PM,
>>> we just point the journal store to the central one in the 1st DC and
>>> read the data from the PM in the MySQL slave in the 2nd DC. A read-only
>>> jackrabbit only has to write to the journal table and nowhere else
>>> AFAIK, so that works well even with replicating MySQLs.
>>>
>>> All fine and good and even if the master MySQL goes down the Jackrabbit
>>> instance in the 2nd DC serves its nodes as nothing happened.
>>>
>>> The one problem which there is left is that there's a replication lag
>>> between the master and the slave MySQL (there's one, even if the sit
>>> just besides each other). What can happen with this is that a writing
>>> jackrabbit writes a new node and the journal entry and then the backup
>>> jackrabbit reads from the journal (from the mysql master) but the actual
>>> content didn't end up in the mysql slave (where the backup jackrabbit
>>> reads its PM data from). This can easily be tested with stopping the
>>> mysql replication.
>>>
>>> The solution I came up with was to read the journal entries also from
>>> the MySQL slave (but still write the LOCAL_REVISION to the master). With
>>> this we can make sure the jackrabbit in the 2nd DC only reads entries,
>>> which are already in its mysql slave. A patch which makes this work is here
>>>
>>> https://gist.github.com/951467
>>>
>>> The only thing I had to change was to read the "selectRevisionsStmtSQL"
>>> from the slave instead of the master, the rest can still go to the master.
>>>
>>> What do you think of this approach? Would this be worth adding to
>>> jackrabbit? Any input for the patch what I could improve?
>>>
>>> Besides the fail-over scenario you also can easily do scaling with that
>>> approach, so you can serve your "read-only" webpages from a totally
>>> differnt DC without having too much traffic between the DCs (it's
>>> basically just the MySQL replication traffic). That's why I didn't want
>>> to read from the Master in the backup jackrabbit and only switch to the
>>> replicating slave, when things fail (which would be a solution, too, of
>>> course)
>>>
>>> any input is appreciated
>>
>> I've played many times with the idea of creating some kind SlaveNode
>> next to the ClusterNode which only needs read access to the database
>> (slave). I don't think the local revision of the slave isn't much use
>> to the master so that could be kept on disk locally with the slave.
> 
> AFAICT, the janitor needs to know, where all the cluster-instances are
> to safely delete everything it isn't needed anymore. That's why it needs
> to be stored in a central place.
> 
> chregu
> 

Mime
View raw message