jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Adding new nodes to a cluster
Date Tue, 14 Sep 2010 09:41:34 GMT
Hello Vidar,

On Tue, Sep 14, 2010 at 11:27 AM, Vidar Ramdal <vidar@idium.no> wrote:
> We're setting up a clustered Jackrabbit application.
> The application has hight traffic, so we're concerned that the Journal
> table will be very large. This, in turn, will make setting up new
> nodes a time-consuming task, when the new node starts replaying the
> journal to get up to date.
> At [1], the concept of the janitor is described, which cleans the
> journal table at certain intervals. However, the list of caveats
> states that "If the janitor is enabled then you loose the possibility
> to easily add cluster nodes. (It is still possible but takes detailed
> knowledge of Jackrabbit.)"
> What detailed knowledge does this take? Can anyone give me some hints
> of what we need to look into?
> Also, we're not 100% sure we know what happens when a new node is
> added. We understand that the journal needs to be replayed so that the
> Lucene index kan be updated. But is the Lucene index the only thing
> that needs modification when a new node is started?
> If so, should this procedure work:
> 1. Take a complete snapshot (disk image) of one of the live nodes -
> including the Lucene index

Although I am not to familiar with the clustered setup (others at my
company are), I know that this is not possible unfortunately. The
problem is that the most recent Lucene index is an in-memory one. You
cannot get correct snapshots from the index. It is something I'd love
to get improved in some time in Jackrabbit.

As a more general, and part of the same thing, but, also a very large
job, would be to see how clustering would work out with optionally
using infinispan keeping a clustered in memory Lucene index, and use 1
or 2 repository nodes to store Lucene segments into the database. This
way, I think the journals largely become redundant, adding repository
nodes to a cluster is trivial, and the database can also contain the
persisted Lucene segments. I am confident this can work as some people
are using Hibernate clustered in this way. It does however imply large
refactoring in the jackrabbit query package: for example moving from
the multi-index to a single one and just use re-open on the index

Unfortunately, you are not helped with this, just giving a brain dump

Regards Ard

> 2. Use the disk image to setup a new node
> 4. Assign a new, uniqe cluster node ID to the new node

View raw message