lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Prafulla Kiran <prafu...@tachyontech.net>
Subject Re: BooleanQuery Performance Help
Date Mon, 22 Dec 2008 03:31:12 GMT
Hi,

Here's the code which I am using to time the query:

long startTime = System.currentTimeMillis();
TopDocCollector collector = new TopDocCollector(10);
is.search(query,collector);
ScoreDoc[] hits = collector.topDocs().scoreDocs;
long endTime = System.currentTimeMillis();

Most of the clauses which I removed, had very few unique terms: say 2 or 3.
I have started taking the timings after I've fired the warmup queries.
Also, I am not doing any kind of sorting or iterating through the hits 
object.

Regards,
Prafulla

Erick Erickson wrote:
> What specifically are you measuring when you time the queries? I've been
> mislead by including in my measurement say, creating the response. I realize
> that throughput includes assembling the response, but the solution is
> different
> depending upon whether it's the actual search or what you do with the
> results that takes the time.
>
> Are you doing any sorting?
>
> Are you using a Hits object and iterating on it? This gets very inefficient.
>
> You might post your code where you time the query. Also what do the "few
> specific clauses" you remove look like? Do they have anything to do with
> time?
> How many unique values do the fields have that you remove to see the
> improvement?
>
> Do you start your timings *after* you've fired up a few warmup queries?
>
> Best
> Erick
>
> On Sat, Dec 20, 2008 at 9:23 AM, Prafulla Kiran <prafulla@tachyontech.net>wrote:
>
>   
>> Hi Everyone,
>>
>> I have an index of relatively small size (400mb) , containing roughly 0.7
>> million documents. The index is actually a copy of an existing database
>> table. Hence, most of my queries are of the form
>>
>> " +field1:value1 +field2:value2 +field3:value3..... ~20 fields"
>>
>> I have been running performance tests using this query. Strangely, I
>> noticed that if I remove some specific clauses... I get a performance
>> improvement of atleast 5 times. Here are the numbers and examples, so that I
>> could be more precise
>>
>> 1) Complete Query: 90 requests per second using 10 threads
>> 2) If I remove few specific clauses : 500 requests per second using 10
>> threads
>> 3) If I form a new query using only 2 clauses from the set of removed
>> clauses -> 100 requests per second using 10 threads
>>
>> Now, some of these specific clauses are such that they match around half of
>> the entire document set.  Also, note that I need all the query terms to be
>> present in the documents retrieved. My target is to obtain 300 requests per
>> second with the given query (20 clauses). It includes 2 range queries.
>> However, I am unable to get 300 rps unless I remove some of the clauses
>> (which include these range queries) .
>> I have tried using filters without any significant improvement in
>> performance. Also, I have more than enough RAM, so I am using the
>> RAMDirectory to read the index. I have optimized my index before searching.
>> All the tests have been warmed for 5 seconds ( the test duration is 10
>> seconds).
>>
>> My first question is, is this kind of decrease in performance expected as
>> the number of clauses shoot up ? Using a single clause out of these 20 , I
>> was able to get 2000 requests per second!
>> Could someone please guide me if there are any other ways in which I can
>> obtain improvement in performance ?
>> Particularly, I am interested to know more about what further caching could
>> be done apart from the default caching which lucene does.
>>
>> Thanks In Advance,
>> Prafulla
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
>   
> ------------------------------------------------------------------------
>
>
> No virus found in this incoming message.
> Checked by AVG - http://www.avg.com 
> Version: 8.0.176 / Virus Database: 270.9.19/1857 - Release Date: 12/19/2008 10:09 AM
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message