lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "William W" <william_...@hotmail.com>
Subject Re: Two possible solutions on Parallel Searching
Date Thu, 13 Nov 2003 17:58:17 GMT
Hi Folks,

If I have two indexes and use the MultiSearcher will it  be faster than only 
one index with all the documents ?
Thanks,
William.

>From: Doug Cutting <cutting@lucene.com>
>Reply-To: "Lucene Users List" <lucene-user@jakarta.apache.org>
>To: Lucene Users List <lucene-user@jakarta.apache.org>
>Subject: Re: Two possible solutions on Parallel Searching
>Date: Thu, 13 Nov 2003 09:16:38 -0800
>
>First, note that the approaches you describe will only improve performance 
>if you have multiple CPUs and/or multiple disks holding the indexes.
>
>Second, MultiSearcher is currently implemented to search indexes serially, 
>not each in a separate thread.  To implement multi-threaded searching one 
>could subclass MultiSearcher with, e.g., ParallelSearcher, and override the 
>search() methods with multi-threaded implemenations. This would be a great 
>contribution if someone is interested!
>
>The parallel approach I prefer is to maintain a set of indexes, each on a 
>separate machine, then use something like a ParallelSearcher of 
>RemoteSearchables to search them all.
>
>Doug
>
>Jie Yang wrote:
>>I had a thought on my earlier post on "Poor
>>Performance when searching for 500+ terms".
>>
>>The problem is on how to improve the performance when
>>searching for 500+ OR search terms. i.e. enter a
>>search string of :
>>
>>W1 OR W2 OR W3 OR ...... OR w500.
>>
>>I thought I could rewrite the MultiSearcher class so
>>that it can initiate multiple parallel IndexSearchers
>>to perform the search.
>>
>>Solution 1 would be divide the query string of "500 OR
>>conditions terms" into 25 "20 OR conditions terms",
>>and then pass them to MultiSearcher, MultiSearcher
>>then initiate 25 threads to search on a single index
>>directory.
>>
>>Solution 2 would be when building an index of 1
>>million docs, instead of building one single index
>>containing 1 million docs,
>>build 10 index directory eaching containing 100K
>>records. then I pass a single query string of "500 OR
>>conditions terms" to
>>MultiSearcher, MultiSearcher then initiate 10 threads
>>to search for 10 different index directories.
>>
>>Has anyone tried something similar, which solution
>>would be a better one. Also is using multiple threads
>>on a single directory a good ideal? Are there any
>>bottlenecks for threads acessing resources, or
>>I better pass requests into different processes.
>>
>>Thanks a lot
>>
>>
>>
>>
>>  --- Jie Yang <jyang_work@yahoo.co.uk> wrote: > Thanks
>>Julian
>>
>>>I am not using RAMDirectory due to the large size of
>>>index file. the index generated on hard disc is
>>>1.57G
>>>for 1 million documents, each document has average
>>>500
>>>terms. I am using Field.UnStored(fieldName, terms),
>>>so
>>>i beliece I am not storing the documents, just the
>>>index. (is that right?) is there anyway to reduce
>>>the
>>>index size created? also What is the maximum size of
>>>data can be stored in RAMDirectory? I suppose I
>>>could
>>>get a 10G RAM solaris box, but would that be
>>>advisable
>>>say storing 2-3G of index data in memory? Also, what
>>>is the performance boost factor when RAMDirectory
>>>comparing to FSDirectory. Are we taling about > 100%
>>>here?
>>>
>>>On your 2nd and 3rd suggestion, I probably run the
>>>latest code that includes the fix by Dmitry
>>>Serebrennikov, the build was checked out from CVS
>>>yesterday. and I used a QueryParser similar to the
>>>one
>>>used in the demo code.
>>>
>>>Again, I still feel a bit curious and want to find
>>>out
>>>does lucene do (or in the future) pre-filter on "AND
>>>join conditions". For example, A AND (B OR C OR D).
>>>if
>>>A finds 100 docs out of 1 million, can lucene
>>>restrict
>>>the searchs on B,C,D only within the 100 docs found?
>>>
>>>Thanks a lot.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>>Response to: Poor Performance when searching for
>>>
>>>500+
>>>
>>>>terms (Jie Yang)
>>>
>>>>From: Julien Nioche <Julien.Nioche@lingway.com>
>>>
>>>Subject: Poor Performance when searching for 500+
>>>terms
>>>
>>>>Date: Thu, 13 Nov 2003 12:45:50 +0100
>>>>Content-Type: text/plain; charset="iso-8859-1"
>>>>
>>>>Hello,
>>>>
>>>>Since there are a lot of Term objects in your
>>>
>>>Query,
>>>
>>>>your application must
>>>>spend a lot of time collecting information about those Terms.
>>>>
>>>>1/ Do you use RAMDirectory? Loading the whole Directory into memory will
>>>>increase speed - your index must not be too big
>>>
>>>though
>>>
>>>>2/ You are probably not using the QueryParser - so when you are building 
>>>>the
>>>>Query you could sort the Term objects inside a BooleanQuery. Sorting the
>>>>Terms will reduce jumps on disk. I have no
>>>
>>>benchmarks
>>>
>>>
>>>>for this, but
>>>>logically, it should have some positive effect when
>>>
>>>>using FSDirectory. Am I wrong?
>>>
>>>>3/ There was a patch submitted by Dmitry
>>>
>>>Serebrennikov
>>>
>>>(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02762.html)
>>>
>>>>which reduced garbage collecting by limiting the creation of temporary 
>>>>Term objects. This patch has not been included in Lucene code (a bug in 
>>>>it?).
>>>>
>>>>Hope it helps.
>>>>
>>>>Julien
>>>
>>>
>>>
>>________________________________________________________________________
>>
>>>Want to chat instantly with your online friends? Get the FREE Yahoo!
>>>Messenger http://mail.messenger.yahoo.co.uk
>>>
>>
>>
>>________________________________________________________________________
>>Want to chat instantly with your online friends?  Get the FREE Yahoo!
>>Messenger http://mail.messenger.yahoo.co.uk
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>

_________________________________________________________________
>From Beethoven to the Rolling Stones, your favorite music is always playing 
on MSN Radio Plus. No ads, no talk. Trial month FREE!  
http://join.msn.com/?page=offers/premiumradio


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message