lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell M. Allen" <Russell.Al...@aebn.net>
Subject RE: About search performance
Date Mon, 31 Jul 2006 13:24:11 GMT
You should build your own performance test cases to see what works for your data.  That being
said, here are some numbers from a similar test I ran:

I did the following:
1) run a single term query which resulted in about half of the total set of documents being
returned.  (~36,000)
2) built a Boolean query (tree of them actually) with a Boolean clause for each of the 36,000
documents.
3) run this new secondary query to get 'final' results.

The time it took to run each step was:
1) 31 ms
2) 1610 ms
3) 1047 ms

Without the giant Boolean query attached to the final query, stem three takes about 15 ms.
 So... Yes, a large query structure will increase query execution time.  In researching how
to reduce this, I found filters to be the fastest.  Using filters:

Steps required:
1) run the initial single term query.
2) Build a query object model based on results from step 1.
3) Build bitset filter using query from step 2.  CACHE THE RESULTS!
4) Execute main query, with cached bitset filter.

Execution time:
1) 16 ms
2) 1609 ms
3) 969 ms
4) 15 ms

As you can see, the total time is longer, but, in our specific case, the initial query rarely
changes.  Thus, by caching the filter results, the vast majority of queries are done in 15
ms.  This only works because we are not updating our indexes that often, and the initial query
is fairly static too.

This is a classic trade.  I am trading space for time.  Each cached filter costs around 2k
(reflection of index size) of memory to keep around.  In turn, that saves me a little over
a second from the final query.

I hope these number help, and perhaps spark an idea.

-Russell

-----Original Message-----
From: zhongyi yuan [mailto:yzy1980@gmail.com] 
Sent: Friday, July 28, 2006 9:47 PM
To: java-user@lucene.apache.org
Subject: About search performance

Hi,How about implement multi-key search use lucene, for example use boolean search exceed
1000 clauses,it will affect the performance greatly. If use filter or custom sorter to select
the result, because the result is extremely large in amount,so the performance is lower.
Please give me some advice,Thanks.
Mime
View raw message