lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jie Yang <jyang_w...@yahoo.co.uk>
Subject Re: Poor Performance when searching for 500+ terms
Date Thu, 13 Nov 2003 14:35:00 GMT
Thanks Julian

I am not using RAMDirectory due to the large size of
index file. the index generated on hard disc is 1.57G
for 1 million documents, each document has average 500
terms. I am using Field.UnStored(fieldName, terms), so
i beliece I am not storing the documents, just the
index. (is that right?) is there anyway to reduce the
index size created? also What is the maximum size of
data can be stored in RAMDirectory? I suppose I could
get a 10G RAM solaris box, but would that be advisable
say storing 2-3G of index data in memory? Also, what
is the performance boost factor when RAMDirectory
comparing to FSDirectory. Are we taling about > 100%
here?

On your 2nd and 3rd suggestion, I probably run the
latest code that includes the fix by Dmitry
Serebrennikov, the build was checked out from CVS
yesterday. and I used a QueryParser similar to the one
used in the demo code.

Again, I still feel a bit curious and want to find out
does lucene do (or in the future) pre-filter on "AND
join conditions". For example, A AND (B OR C OR D). if
A finds 100 docs out of 1 million, can lucene restrict
the searchs on B,C,D only within the 100 docs found?

Thanks a lot.



 



>Response to: Poor Performance when searching for 500+
>terms (Jie Yang) 

>From: Julien Nioche <Julien.Nioche@lingway.com>
Subject: Poor Performance when searching for 500+
terms
>Date: Thu, 13 Nov 2003 12:45:50 +0100
>Content-Type: text/plain; charset="iso-8859-1"
>
>Hello,
>
>Since there are a lot of Term objects in your Query, 
>your application must
>spend a lot of time collecting information about 
>those Terms.
>
>1/ Do you use RAMDirectory? Loading the whole 
>Directory into memory will
>increase speed - your index must not be too big
though
>
>2/ You are probably not using the QueryParser - so 
>when you are building the
>Query you could sort the Term objects inside a 
>BooleanQuery. Sorting the
>Terms will reduce jumps on disk. I have no benchmarks

>for this, but
>logically, it should have some positive effect when 
>using FSDirectory. Am I wrong?

>3/ There was a patch submitted by Dmitry
Serebrennikov
>(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02762.html)
>which reduced garbage collecting by limiting the 
>creation of temporary Term objects. This patch has 
>not been included in Lucene code (a bug in it?).
>
>Hope it helps.
>
>Julien


________________________________________________________________________
Want to chat instantly with your online friends?  Get the FREE Yahoo!
Messenger http://mail.messenger.yahoo.co.uk

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message