jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidar Ramdal <vi...@idium.no>
Subject Re: Adding new nodes to a cluster
Date Tue, 14 Sep 2010 09:50:20 GMT
> On Tue, Sep 14, 2010 at 11:27 AM, Vidar Ramdal <vidar@idium.no> wrote:
>> We're setting up a clustered Jackrabbit application.
>> The application has hight traffic, so we're concerned that the Journal
>> table will be very large. This, in turn, will make setting up new
>> nodes a time-consuming task, when the new node starts replaying the
>> journal to get up to date.
>>
>> At [1], the concept of the janitor is described, which cleans the
>> journal table at certain intervals. However, the list of caveats
>> states that "If the janitor is enabled then you loose the possibility
>> to easily add cluster nodes. (It is still possible but takes detailed
>> knowledge of Jackrabbit.)"
>>
>> What detailed knowledge does this take? Can anyone give me some hints
>> of what we need to look into?
>>
>> Also, we're not 100% sure we know what happens when a new node is
>> added. We understand that the journal needs to be replayed so that the
>> Lucene index kan be updated. But is the Lucene index the only thing
>> that needs modification when a new node is started?
>> If so, should this procedure work:
>> 1. Take a complete snapshot (disk image) of one of the live nodes -
>> including the Lucene index

On Tue, Sep 14, 2010 at 11:41 AM, Ard Schrijvers
<a.schrijvers@onehippo.com> wrote:
> Although I am not to familiar with the clustered setup (others at my
> company are), I know that this is not possible unfortunately. The
> problem is that the most recent Lucene index is an in-memory one. You
> cannot get correct snapshots from the index. It is something I'd love
> to get improved in some time in Jackrabbit.

OK, but what if we shutdown the application before taking the
snapshot? Will this give us a usable starting point?
The procedure would then be:
1. Shutdown the live node A
2. Take a disk image snapshot of A
3. Use the disk image to create a new instance B
4. Alter the cluster node ID in B's repository.xml
5. Restart A
6. Start B

> As a more general, and part of the same thing, but, also a very large
> job, would be to see how clustering would work out with optionally
> using infinispan keeping a clustered in memory Lucene index, and use 1
> or 2 repository nodes to store Lucene segments into the database. This
> way, I think the journals largely become redundant, adding repository
> nodes to a cluster is trivial, and the database can also contain the
> persisted Lucene segments. I am confident this can work as some people
> are using Hibernate clustered in this way. It does however imply large
> refactoring in the jackrabbit query package: for example moving from
> the multi-index to a single one and just use re-open on the index
> reader.

+1 :)

-- 
Vidar S. Ramdal <vidar@idium.no> - http://www.idium.no
Sommerrogata 13-15, N-0255 Oslo, Norway
+ 47 22 00 84 00 / +47 22 00 84 76
Quando omni flunkus moritatus!

Mime
View raw message