lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Peter W." <>
Subject Re: Clustering Lucene with 40 Servers
Date Wed, 03 Jan 2007 04:58:29 GMT

Don't have any of the scalability requirements mentioned in this  
thread but the problem is an interesting one.
Lucene needs a connection pool equivalent IMHO or a best practices  
method for load balancing.

Opening, locking, reading and writing to remote indexes over RMI  
seems good on paper but likely to melt
with anything approaching the kind of web traffic seen by a popular  
site. This is why you see people
running (so many) JVM's locally. Solr helps but passing long XML or  
JSON urls for thousands or millions of
requests between your own machines to maintain a Lucene index looks  
redundant to me.

Adding messaging layers to propagate changes or updates introduces  
more points of failure.

I wonder if a system where just a few machines capture say 100k  
updates (at a time) in memory then write .gz
to locally attached external drives would work. These separated data  
files would be exposed thru a web service
where load balanced remote boxes access them using servlets.

They connect in rotation downloading batched index updates. Heck,  
start splitting up big files using Hadoop's
HDFS and make it a party!


Peter W.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message