lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <>
Subject Re: Poor Performance when searching for 500+ terms
Date Thu, 13 Nov 2003 15:49:04 GMT
> I am not using RAMDirectory due to the large size of
> index file. the index generated on hard disc is 1.57G
> for 1 million documents, each document has average 500
> terms. I am using Field.UnStored(fieldName, terms), so
> i beliece I am not storing the documents, just the
> index. (is that right?) is there anyway to reduce the


> index size created? also What is the maximum size of
> data can be stored in RAMDirectory? I suppose I could

Depends on how much RAM you have.

> get a 10G RAM solaris box, but would that be advisable
> say storing 2-3G of index data in memory?

Yes, why not.

> Also, what
> is the performance boost factor when RAMDirectory
> comparing to FSDirectory. Are we taling about > 100%
> here?

I don't know if I can qualify it like that, but one of the Lucene
articles (Resources section on Lucene's site) covers RAMDirectory and
shows performance differences.

> On your 2nd and 3rd suggestion, I probably run the
> latest code that includes the fix by Dmitry
> Serebrennikov, the build was checked out from CVS
> yesterday. and I used a QueryParser similar to the one
> used in the demo code.

Dmitry's code was never included in CVS.  I don't know/remember why.

> Again, I still feel a bit curious and want to find out
> does lucene do (or in the future) pre-filter on "AND
> join conditions". For example, A AND (B OR C OR D). if
> A finds 100 docs out of 1 million, can lucene restrict
> the searchs on B,C,D only within the 100 docs found?

Ah, this was mentioned recently on the list.
I don't remember what the conclusion was, but Erik did remind us of
Lucene's filters which can help in some situations.


> >Response to: Poor Performance when searching for 500+
> >terms (Jie Yang) 
> >From: Julien Nioche <>
> Subject: Poor Performance when searching for 500+
> terms
> >Date: Thu, 13 Nov 2003 12:45:50 +0100
> >Content-Type: text/plain; charset="iso-8859-1"
> >
> >Hello,
> >
> >Since there are a lot of Term objects in your Query, 
> >your application must
> >spend a lot of time collecting information about 
> >those Terms.
> >
> >1/ Do you use RAMDirectory? Loading the whole 
> >Directory into memory will
> >increase speed - your index must not be too big
> though
> >
> >2/ You are probably not using the QueryParser - so 
> >when you are building the
> >Query you could sort the Term objects inside a 
> >BooleanQuery. Sorting the
> >Terms will reduce jumps on disk. I have no benchmarks
> >for this, but
> >logically, it should have some positive effect when 
> >using FSDirectory. Am I wrong?
> >3/ There was a patch submitted by Dmitry
> Serebrennikov
> >which reduced garbage collecting by limiting the 
> >creation of temporary Term objects. This patch has 
> >not been included in Lucene code (a bug in it?).
> >
> >Hope it helps.
> >
> >Julien
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message