lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yang <jyang_w...@yahoo.co.uk>
Subject Two possible solutions on Parallel Searching
Date Thu, 13 Nov 2003 16:00:53 GMT
I had a thought on my earlier post on "Poor
Performance when searching for 500+ terms". 

The problem is on how to improve the performance when
searching for 500+ OR search terms. i.e. enter a
search string of :

W1 OR W2 OR W3 OR ...... OR w500.

I thought I could rewrite the MultiSearcher class so
that it can initiate multiple parallel IndexSearchers
to perform the search.

Solution 1 would be divide the query string of "500 OR
conditions terms" into 25 "20 OR conditions terms",
and then pass them to MultiSearcher, MultiSearcher
then initiate 25 threads to search on a single index
directory.

Solution 2 would be when building an index of 1
million docs, instead of building one single index
containing 1 million docs,
build 10 index directory eaching containing 100K
records. then I pass a single query string of "500 OR
conditions terms" to
MultiSearcher, MultiSearcher then initiate 10 threads
to search for 10 different index directories. 

Has anyone tried something similar, which solution
would be a better one. Also is using multiple threads
on a single directory a good ideal? Are there any
bottlenecks for threads acessing resources, or
I better pass requests into different processes. 

Thanks a lot




 --- Jie Yang <jyang_work@yahoo.co.uk> wrote: > Thanks
Julian
> 
> I am not using RAMDirectory due to the large size of
> index file. the index generated on hard disc is
> 1.57G
> for 1 million documents, each document has average
> 500
> terms. I am using Field.UnStored(fieldName, terms),
> so
> i beliece I am not storing the documents, just the
> index. (is that right?) is there anyway to reduce
> the
> index size created? also What is the maximum size of
> data can be stored in RAMDirectory? I suppose I
> could
> get a 10G RAM solaris box, but would that be
> advisable
> say storing 2-3G of index data in memory? Also, what
> is the performance boost factor when RAMDirectory
> comparing to FSDirectory. Are we taling about > 100%
> here?
> 
> On your 2nd and 3rd suggestion, I probably run the
> latest code that includes the fix by Dmitry
> Serebrennikov, the build was checked out from CVS
> yesterday. and I used a QueryParser similar to the
> one
> used in the demo code.
> 
> Again, I still feel a bit curious and want to find
> out
> does lucene do (or in the future) pre-filter on "AND
> join conditions". For example, A AND (B OR C OR D).
> if
> A finds 100 docs out of 1 million, can lucene
> restrict
> the searchs on B,C,D only within the 100 docs found?
> 
> Thanks a lot.
> 
> 
> 
>  
> 
> 
> 
> >Response to: Poor Performance when searching for
> 500+
> >terms (Jie Yang) 
> 
> >From: Julien Nioche <Julien.Nioche@lingway.com>
> Subject: Poor Performance when searching for 500+
> terms
> >Date: Thu, 13 Nov 2003 12:45:50 +0100
> >Content-Type: text/plain; charset="iso-8859-1"
> >
> >Hello,
> >
> >Since there are a lot of Term objects in your
> Query, 
> >your application must
> >spend a lot of time collecting information about 
> >those Terms.
> >
> >1/ Do you use RAMDirectory? Loading the whole 
> >Directory into memory will
> >increase speed - your index must not be too big
> though
> >
> >2/ You are probably not using the QueryParser - so 
> >when you are building the
> >Query you could sort the Term objects inside a 
> >BooleanQuery. Sorting the
> >Terms will reduce jumps on disk. I have no
> benchmarks
> 
> >for this, but
> >logically, it should have some positive effect when
> 
> >using FSDirectory. Am I wrong?
> 
> >3/ There was a patch submitted by Dmitry
> Serebrennikov
>
>(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02762.html)
> >which reduced garbage collecting by limiting the 
> >creation of temporary Term objects. This patch has 
> >not been included in Lucene code (a bug in it?).
> >
> >Hope it helps.
> >
> >Julien
> 
> 
>
________________________________________________________________________
> Want to chat instantly with your online friends? 
> Get the FREE Yahoo!
> Messenger http://mail.messenger.yahoo.co.uk
>  

________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message