lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prafulla Kiran <>
Subject BooleanQuery Performance Help
Date Sat, 20 Dec 2008 14:23:43 GMT
Hi Everyone,

I have an index of relatively small size (400mb) , containing roughly 
0.7 million documents. The index is actually a copy of an existing 
database table. Hence, most of my queries are of the form

" +field1:value1 +field2:value2 +field3:value3..... ~20 fields"

I have been running performance tests using this query. Strangely, I 
noticed that if I remove some specific clauses... I get a performance 
improvement of atleast 5 times. Here are the numbers and examples, so 
that I could be more precise

1) Complete Query: 90 requests per second using 10 threads
2) If I remove few specific clauses : 500 requests per second using 10 
3) If I form a new query using only 2 clauses from the set of removed 
clauses -> 100 requests per second using 10 threads

Now, some of these specific clauses are such that they match around half 
of the entire document set.  Also, note that I need all the query terms 
to be present in the documents retrieved. My target is to obtain 300 
requests per second with the given query (20 clauses). It includes 2 
range queries. However, I am unable to get 300 rps unless I remove some 
of the clauses (which include these range queries) .
I have tried using filters without any significant improvement in 
performance. Also, I have more than enough RAM, so I am using the 
RAMDirectory to read the index. I have optimized my index before 
searching. All the tests have been warmed for 5 seconds ( the test 
duration is 10 seconds).

My first question is, is this kind of decrease in performance expected 
as the number of clauses shoot up ? Using a single clause out of these 
20 , I was able to get 2000 requests per second!
Could someone please guide me if there are any other ways in which I can 
obtain improvement in performance ?
Particularly, I am interested to know more about what further caching 
could be done apart from the default caching which lucene does.

Thanks In Advance,

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message