lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Teruhiko Kurosaka <K...@basistech.com>
Subject RE: Lucene performance: is search time linear to the index size?
Date Thu, 18 Jun 2009 19:54:51 GMT
Erik,
The way I test this program is by issuing 1000 queries and
I have profiled it to make sure the start up cost is negligible.

I ran a further test and discovered that the search time is actually
proportional to the number of potential hits.  (I am saying
"potential hits" because I am limiting the number of hits
by specifing "n" parameter in search method.)

Because the number of hits was proportinoal to the number 
of Documents in the index in my previous test, I came
to a wrong conclusion that the search time is proportional 
to the index size.  If I have only one Document that can 
matches with a query, the search time remains constant no 
matter how large the index is.

-kuro  

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com] 
> Sent: Thursday, June 18, 2009 12:44 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene performance: is search time linear to the 
> index size?
> 
> Opening a searcher and doing the first query incurs a 
> significant amount of overhead, cache loading, etc. Inferring 
> search times relative to index size with a program like you 
> describe is unreliable.
> 
> Try firing a few queries at the index without measuring, 
> *then* measure the time it takes for subsequent queries and 
> you'll get a much better picture of actual response time.
> 
> The fact that a program that fires a single query at a newly 
> opened reader has near-linear performance isn't as surprising 
> as all that. I'd be more concerned if, say, queries 10 
> through 100 *on the same underlying reader* displayed this behavior.
> 
> See:
> 
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed?highl
> ight=(warming)
> 
> especially the questions around:
> *When measuring performance, disregard the first query
> 
> Best
> Erick
> *
> On Thu, Jun 18, 2009 at 12:49 AM, Teruhiko Kurosaka 
> <Kuro@basistech.com>wrote:
> 
> > I've written a test program that uses the simplest form of search, 
> > TermQuery and measure the time it takes to search a term in 
> a field on 
> > indices of various sizes.
> >
> > The result is a very linear growth of search time vs the 
> index size in 
> > terms of # of Documents, not # of unique terms in that field.
> >
> > -kuro
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message