lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <>
Subject Re: Serving remote lucene client - RMI vs HTTP
Date Mon, 16 Jul 2007 02:59:32 GMT
Hi Kumar,

I am curious about where the bulk of time was spent in Lucene.  I am  
not doubting your numbers or analysis, just want to know if there was  
anything that could be improved in Lucene and make sure you are  
solidly in need of going to multiple machines as it adds an extra  
level of complexity.  Can you provide info about number of documents,  
# of updates, # of queries etc. ?  What kind of analysis, etc. are  
you doing?  That being said, putting the web app on one machine and  
the search application on the other makes sense much in the same way  
it makes sense in most cases to do this for a database.

As for POST/GET vs. RMI, have a look at Solr, esp. its replication  
capabilities for load balancing search servers.  I think it answers  
the question in favor of the POST/GET approach.  In fact, you may be  
able to drop in Solr to your situation w/o too much work.


On Jul 15, 2007, at 10:10 PM, kumarlimbu wrote:

> Hi Everyone,
> We are using lucene,nutch and spring framework to create a specialized
> search engine. Due to growing traffic we are trying to scale. By  
> doing some
> tests we found out that the bottle neck was lucene search. We used  
> some
> heavy traffic simulation and logged the time taken by each portion  
> of the
> server response and found out that the bulk of the time was spent in
> searching from lucene index.
> In order to accomodate higher traffic we are planning on splitting our
> application in 2 portions:
> 1. Web application (on 1 machine)
> 2. Search application (one more than 1 machine)
> Each one of the application will reside on a (possibly) separate  
> machines.
> We are looking forward to scaling by adding more than 1 machine  
> dedicated to
> searching as lucene search seems to be the bottleneck.
> Web application will provide the front-end to the user. All static  
> pages,
> images and the style information will reside on this machine. It  
> will also
> serve dynamic pages but all the searching will take place on the  
> search
> application. Web application will send search parameters to the search
> application and after searching, it will send back the results to  
> the web
> application which will format it and display it to the user.
> What we are unable to decide is whether to use RMI (
> cluster4spring/index.html
> cluster4spring  )or simple HTTP POST/GET request with response in XML
> format. We did some research and found out that RMI (with clustering
> support) might be more suitable for our needs. Unfortunately our  
> team is not
> familiar with RMI and so we don't know if there will be any issues  
> with it
> during implementation.
> Advantage of using simple GET/POST is we have more control which  
> searcher
> app to use and when. An important criteria for us is to disable  
> searching
> from the searcher app who's index is being updated. This is very  
> important
> for us. Is there anyway in RMI to inform the web (client)  
> application that a
> particular server is unavailable (due to index being updated).
> We would also like to know if anybody has implemented lucene  
> searching in a
> similar fashion. Thanks for your help.
> Kumar Limbu
> -- 
> View this message in context: 
> lucene-client---RMI-vs-HTTP-tf4084167.html#a11608209
> Sent from the Lucene - Java Users mailing list archive at

Grant Ingersoll
Center for Natural Language Processing

Read the Lucene Java FAQ at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message