lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <>
Subject Re: Poor Performance when searching for 500+ terms
Date Thu, 13 Nov 2003 11:45:50 GMT

Since there are a lot of Term objects in your Query, your application must
spend a lot of time collecting information about those Terms.

1/ Do you use RAMDirectory? Loading the whole Directory into memory will
increase speed - your index must not be too big though

2/ You are probably not using the QueryParser - so when you are building the
Query you could sort the Term objects inside a BooleanQuery. Sorting the
Terms will reduce jumps on disk. I have no benchmarks for this, but
logically, it should have some positive effect when using FSDirectory. Am I

3/ There was a patch submitted by Dmitry Serebrennikov
which reduced garbage collecting by limiting the creation of temporary Term
objects. This patch has not been included in Lucene code (a bug in it?).

Hope it helps.


----- Original Message -----
From: "Jie Yang" <>
To: "Lucene Users List" <>
Sent: Wednesday, November 12, 2003 10:11 PM
Subject: Poor Performance when searching for 500+ terms

> I know this is rare, But I am building an application
> that submits searches having 500+ search terms. A
> general example would be
> field1:w1 OR field1:w2 OR ... OR field1:w500
> For 1 millions documents, the performance is OK if
> field1 in each document has less than 50 terms, I can
> get result < 1 sec. but if field1 has more than
> average 400 terms in each document, the performance
> degrades to around 6 secs.
> Is there anyway to improve this?
> And my second questions is that my query often comes
> with an AND condition with another search word. for
> example:
> field2:w AND (field1:w1 OR field1:w2, ... field1:w500)
> field2:w will only return less than 1000 records out
> of 1 millions. then I thought I could use a
> StringFilter Object? i.e. search on field2.w first,
> thus limit the search for 500 OR only on the field2.w
> 1000 results. somewhat like a join in database. But I
> checked the code and sees that IndexSearcher always
> perfomance the 500 disk searches before calling the
> filter object? Any suggestions on this?
> Also does lucene caches results in memory? I see the
> performance tends to get better after a few runs,
> especailly on searches on fields having small number
> of terms. If so, can I manipulate the cache size
> somehow to accommdate fields with large number of
> terms.
> Many thanks.
> ________________________________________________________________________
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message