lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <>
Subject Re: Two possible solutions on Parallel Searching
Date Thu, 13 Nov 2003 17:16:38 GMT
First, note that the approaches you describe will only improve 
performance if you have multiple CPUs and/or multiple disks holding the 

Second, MultiSearcher is currently implemented to search indexes 
serially, not each in a separate thread.  To implement multi-threaded 
searching one could subclass MultiSearcher with, e.g., ParallelSearcher, 
and override the search() methods with multi-threaded implemenations. 
This would be a great contribution if someone is interested!

The parallel approach I prefer is to maintain a set of indexes, each on 
a separate machine, then use something like a ParallelSearcher of 
RemoteSearchables to search them all.


Jie Yang wrote:
> I had a thought on my earlier post on "Poor
> Performance when searching for 500+ terms". 
> The problem is on how to improve the performance when
> searching for 500+ OR search terms. i.e. enter a
> search string of :
> W1 OR W2 OR W3 OR ...... OR w500.
> I thought I could rewrite the MultiSearcher class so
> that it can initiate multiple parallel IndexSearchers
> to perform the search.
> Solution 1 would be divide the query string of "500 OR
> conditions terms" into 25 "20 OR conditions terms",
> and then pass them to MultiSearcher, MultiSearcher
> then initiate 25 threads to search on a single index
> directory.
> Solution 2 would be when building an index of 1
> million docs, instead of building one single index
> containing 1 million docs,
> build 10 index directory eaching containing 100K
> records. then I pass a single query string of "500 OR
> conditions terms" to
> MultiSearcher, MultiSearcher then initiate 10 threads
> to search for 10 different index directories. 
> Has anyone tried something similar, which solution
> would be a better one. Also is using multiple threads
> on a single directory a good ideal? Are there any
> bottlenecks for threads acessing resources, or
> I better pass requests into different processes. 
> Thanks a lot
>  --- Jie Yang <> wrote: > Thanks
> Julian
>>I am not using RAMDirectory due to the large size of
>>index file. the index generated on hard disc is
>>for 1 million documents, each document has average
>>terms. I am using Field.UnStored(fieldName, terms),
>>i beliece I am not storing the documents, just the
>>index. (is that right?) is there anyway to reduce
>>index size created? also What is the maximum size of
>>data can be stored in RAMDirectory? I suppose I
>>get a 10G RAM solaris box, but would that be
>>say storing 2-3G of index data in memory? Also, what
>>is the performance boost factor when RAMDirectory
>>comparing to FSDirectory. Are we taling about > 100%
>>On your 2nd and 3rd suggestion, I probably run the
>>latest code that includes the fix by Dmitry
>>Serebrennikov, the build was checked out from CVS
>>yesterday. and I used a QueryParser similar to the
>>used in the demo code.
>>Again, I still feel a bit curious and want to find
>>does lucene do (or in the future) pre-filter on "AND
>>join conditions". For example, A AND (B OR C OR D).
>>A finds 100 docs out of 1 million, can lucene
>>the searchs on B,C,D only within the 100 docs found?
>>Thanks a lot.
>>>Response to: Poor Performance when searching for
>>>terms (Jie Yang) 
>>>From: Julien Nioche <>
>>Subject: Poor Performance when searching for 500+
>>>Date: Thu, 13 Nov 2003 12:45:50 +0100
>>>Content-Type: text/plain; charset="iso-8859-1"
>>>Since there are a lot of Term objects in your
>>>your application must
>>>spend a lot of time collecting information about 
>>>those Terms.
>>>1/ Do you use RAMDirectory? Loading the whole 
>>>Directory into memory will
>>>increase speed - your index must not be too big
>>>2/ You are probably not using the QueryParser - so 
>>>when you are building the
>>>Query you could sort the Term objects inside a 
>>>BooleanQuery. Sorting the
>>>Terms will reduce jumps on disk. I have no
>>>for this, but
>>>logically, it should have some positive effect when
>>>using FSDirectory. Am I wrong?
>>>3/ There was a patch submitted by Dmitry
>>>which reduced garbage collecting by limiting the 
>>>creation of temporary Term objects. This patch has 
>>>not been included in Lucene code (a bug in it?).
>>>Hope it helps.
> ________________________________________________________________________
>>Want to chat instantly with your online friends? 
>>Get the FREE Yahoo!
> ________________________________________________________________________
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message