lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: BooleanQuery Performance Help
Date Sat, 20 Dec 2008 15:30:53 GMT
What specifically are you measuring when you time the queries? I've been
mislead by including in my measurement say, creating the response. I realize
that throughput includes assembling the response, but the solution is
different
depending upon whether it's the actual search or what you do with the
results that takes the time.

Are you doing any sorting?

Are you using a Hits object and iterating on it? This gets very inefficient.

You might post your code where you time the query. Also what do the "few
specific clauses" you remove look like? Do they have anything to do with
time?
How many unique values do the fields have that you remove to see the
improvement?

Do you start your timings *after* you've fired up a few warmup queries?

Best
Erick

On Sat, Dec 20, 2008 at 9:23 AM, Prafulla Kiran <prafulla@tachyontech.net>wrote:

> Hi Everyone,
>
> I have an index of relatively small size (400mb) , containing roughly 0.7
> million documents. The index is actually a copy of an existing database
> table. Hence, most of my queries are of the form
>
> " +field1:value1 +field2:value2 +field3:value3..... ~20 fields"
>
> I have been running performance tests using this query. Strangely, I
> noticed that if I remove some specific clauses... I get a performance
> improvement of atleast 5 times. Here are the numbers and examples, so that I
> could be more precise
>
> 1) Complete Query: 90 requests per second using 10 threads
> 2) If I remove few specific clauses : 500 requests per second using 10
> threads
> 3) If I form a new query using only 2 clauses from the set of removed
> clauses -> 100 requests per second using 10 threads
>
> Now, some of these specific clauses are such that they match around half of
> the entire document set.  Also, note that I need all the query terms to be
> present in the documents retrieved. My target is to obtain 300 requests per
> second with the given query (20 clauses). It includes 2 range queries.
> However, I am unable to get 300 rps unless I remove some of the clauses
> (which include these range queries) .
> I have tried using filters without any significant improvement in
> performance. Also, I have more than enough RAM, so I am using the
> RAMDirectory to read the index. I have optimized my index before searching.
> All the tests have been warmed for 5 seconds ( the test duration is 10
> seconds).
>
> My first question is, is this kind of decrease in performance expected as
> the number of clauses shoot up ? Using a single clause out of these 20 , I
> was able to get 2000 requests per second!
> Could someone please guide me if there are any other ways in which I can
> obtain improvement in performance ?
> Particularly, I am interested to know more about what further caching could
> be done apart from the default caching which lucene does.
>
> Thanks In Advance,
> Prafulla
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message