lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Lu <>
Subject Re: Index Replication / Clustering
Date Tue, 28 Jun 2005 03:18:28 GMT
DBSight, a J2EE search engine on database, meets most of your requirements.
It has clustering support. Basically you can configure one DBSight 
server specially for indexing on database content. Another or several 
other DBSight servers devoted to search, and they subscribe to the 
indexing server. When indexing server has created a new index, all 
subscribers will get the same index.

It supports all databases with JDBC.Any contents on database, including 
BLOB/CLOB, can be full-text indexed.

And you can configure most of the components just be web UI.

Check it out:
See the online active demo:

I personally run it on an bug database system of a big company, 60 
And I scheduled the indexing every other 4 minutes. The index size is 
10G, and growing.
I guess it can meet your requirement.


Stephane Bailliez wrote:

> I have been browsing the archives concerning this particular topic.
> I'm in the same boat and the customer has clustering requirements.
> To give some background:
> I have a constant flow of incoming messages flying over the network 
> that need to be archived in db, indexed and dispatched to thousand of 
> clients (rich client console).
> the backend architecture needs to be clustered meaning that:
> - the message broker needs to be clustered
> - the database needs to be replicated and support failover
> - the search engine index needs to be replicated
> This is for a 24x7 operation.
> My main problem is that there is a constant flow of write just about 
> everywhere meaning that the lucene index keeps changing, and that I 
> have a very small window available to replicate the data across the 
> network.
> (As of now, I have 2 messages / minute and should go over 50 in the 
> medium-term).
> Concerning the index, being able to replicate is cool, but if one node 
> goes down, it must be able to resynchronize when you bring it up on 
> the cluster...that's a hell of problem.
> As it is acceptable to have downtime on the search engine, I was 
> thinking it was much easier to:
> 1) rely on a shared index via NFS for each node.
> 2) dedicate a box to the search engine and access it via rpc from each 
> node
> Considering the messages I have seen in the archives, 1) seems to be a 
> no-go.
> Option 2) is generally not recommended but think it could fit my needs 
> quite well. IMHO it should work quite well to bring the box in 
> operation if it goes down. Synchronizing the index for me is just a 
> matter of going through the database to reindex the archived content, 
> this will take sometime but as I said, running in degraded mode is 
> acceptable.
> As anyone any suggestion/recommendation/experience/thoughts concerning 
> the problems mentionned above ?
> Cheers,
> Stephane
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message