lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: Poor Performance when searching for 500+ terms
Date Thu, 13 Nov 2003 11:45:50 GMT
Hello,

Since there are a lot of Term objects in your Query, your application must
spend a lot of time collecting information about those Terms.

1/ Do you use RAMDirectory? Loading the whole Directory into memory will
increase speed - your index must not be too big though

2/ You are probably not using the QueryParser - so when you are building the
Query you could sort the Term objects inside a BooleanQuery. Sorting the
Terms will reduce jumps on disk. I have no benchmarks for this, but
logically, it should have some positive effect when using FSDirectory. Am I
wrong?

3/ There was a patch submitted by Dmitry Serebrennikov
(http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02762.html)
which reduced garbage collecting by limiting the creation of temporary Term
objects. This patch has not been included in Lucene code (a bug in it?).

Hope it helps.

Julien

----- Original Message -----
From: "Jie Yang" <jyang_work@yahoo.co.uk>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Wednesday, November 12, 2003 10:11 PM
Subject: Poor Performance when searching for 500+ terms


> I know this is rare, But I am building an application
> that submits searches having 500+ search terms. A
> general example would be
>
> field1:w1 OR field1:w2 OR ... OR field1:w500
>
> For 1 millions documents, the performance is OK if
> field1 in each document has less than 50 terms, I can
> get result < 1 sec. but if field1 has more than
> average 400 terms in each document, the performance
> degrades to around 6 secs.
>
> Is there anyway to improve this?
>
> And my second questions is that my query often comes
> with an AND condition with another search word. for
> example:
>
> field2:w AND (field1:w1 OR field1:w2, ... field1:w500)
>
> field2:w will only return less than 1000 records out
> of 1 millions. then I thought I could use a
> StringFilter Object? i.e. search on field2.w first,
> thus limit the search for 500 OR only on the field2.w
> 1000 results. somewhat like a join in database. But I
> checked the code and sees that IndexSearcher always
> perfomance the 500 disk searches before calling the
> filter object? Any suggestions on this?
>
> Also does lucene caches results in memory? I see the
> performance tends to get better after a few runs,
> especailly on searches on fields having small number
> of terms. If so, can I manipulate the cache size
> somehow to accommdate fields with large number of
> terms.
>
> Many thanks.
>
>
> ________________________________________________________________________
> Want to chat instantly with your online friends?  Get the FREE Yahoo!
> Messenger http://mail.messenger.yahoo.co.uk
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message