lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Scott Sellman" <ssell...@valueclick.com>
Subject RE: Improving Search Performance on Large Indexes
Date Thu, 24 May 2007 21:56:13 GMT
Hi Su, 

I came across some discussion of ParallelMultiSearcher and RMI in
chapter 5 of the book Lucene in Action.  There are a couple of examples
in there, so might be a good place to start. 

-Scott

-----Original Message-----
From: Su.Cheng [mailto:su.cheng@playstarmusic.com] 
Sent: Thursday, May 24, 2007 1:09 PM
To: java-user@lucene.apache.org
Subject: Re: Improving Search Performance on Large Indexes

Hi Scott,

I met the same situation as you(index 100M documents). If the computer
has only one CPU and one disk, ParallelMultiSearcher is slower than 
MultiSearcher.

I wrote an email "Who has sample code of remote multiple servers
multiple indexes searching" yesterday. If you have any suggestion,
please let me know.

Best regards.

On Thu, 2007-05-24 at 11:38 -0700, Otis Gospodnetic wrote:
> Scott,
> 
> Yes, take your big index and split it into multiple smaller shards.
Put those shards in different servers and then query them remotely
(using the provided RMI thing in Lucene or using something custom), take
top N results from each searcher, merge those, and take top N from the
merged result set.
> 
> You could also experiment with a memory mapped Directory
implementation.
> 
> Otis
>  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
> Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share
> 
> ----- Original Message ----
> From: Scott Sellman <ssellman@valueclick.com>
> To: java-user@lucene.apache.org
> Sent: Thursday, May 24, 2007 1:31:49 PM
> Subject: Improving Search Performance on Large Indexes
> 
> Hello, 
> 
>  
> 
> Currently we are attempting to optimize the search time against an
index
> that is 26 GB in size (~35 million docs) and I was wondering what
> experiences others have had in similar attempts.  Simple searches
> against the index are still fast even at 26GB, but the problem is our
> application allows the user a lot of options in searching, which can
> generate complicated queries.  Based on previous posts we decided to
try
> splitting our index into multiple indexes and use
ParallelMultiSearcher.
> When we split our single index into 6 separate ones we recorded a 25%
> decrease in response time on minimal load.  We haven't done any stress
> testing on it yet, has anyone noticed problems with increased load
when
> using ParallelMultiSearcher?  What about using machines with more
> processors in combination with the ParallelMultiSearcher, does this
> result in much response time improvement?  Or is the slow down
primarily
> with disk access?
> 
>  
> 
> Any recommendations are welcome. 
> 
>  
> 
> Thanks in advance, 
> 
> Scott
> 
>  
> 
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 
-- 
Su.Cheng <su.cheng@playstarmusic.com>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message