lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Booth" <jbo...@wgen.net>
Subject RE: Lucene performance: is search time linear to the index size?
Date Thu, 18 Jun 2009 22:08:50 GMT
Are you fetching all of the results for your search?  If so, you're
actually measuring the time to pull n stored documents out of the index,
not to search over an index of n documents.  Which would of course be
linear, most of your cost there will be the i/o to actually pull the
document from disk, not the search time. 

-----Original Message-----
From: Teruhiko Kurosaka [mailto:Kuro@basistech.com] 
Sent: Thursday, June 18, 2009 2:55 PM
To: java-user@lucene.apache.org
Subject: RE: Lucene performance: is search time linear to the index
size?

Erik,
The way I test this program is by issuing 1000 queries and
I have profiled it to make sure the start up cost is negligible.

I ran a further test and discovered that the search time is actually
proportional to the number of potential hits.  (I am saying
"potential hits" because I am limiting the number of hits
by specifing "n" parameter in search method.)

Because the number of hits was proportinoal to the number 
of Documents in the index in my previous test, I came
to a wrong conclusion that the search time is proportional 
to the index size.  If I have only one Document that can 
matches with a query, the search time remains constant no 
matter how large the index is.

-kuro  

> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com] 
> Sent: Thursday, June 18, 2009 12:44 AM
> To: java-user@lucene.apache.org
> Subject: Re: Lucene performance: is search time linear to the 
> index size?
> 
> Opening a searcher and doing the first query incurs a 
> significant amount of overhead, cache loading, etc. Inferring 
> search times relative to index size with a program like you 
> describe is unreliable.
> 
> Try firing a few queries at the index without measuring, 
> *then* measure the time it takes for subsequent queries and 
> you'll get a much better picture of actual response time.
> 
> The fact that a program that fires a single query at a newly 
> opened reader has near-linear performance isn't as surprising 
> as all that. I'd be more concerned if, say, queries 10 
> through 100 *on the same underlying reader* displayed this behavior.
> 
> See:
> 
> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed?highl
> ight=(warming)
> 
> especially the questions around:
> *When measuring performance, disregard the first query
> 
> Best
> Erick
> *
> On Thu, Jun 18, 2009 at 12:49 AM, Teruhiko Kurosaka 
> <Kuro@basistech.com>wrote:
> 
> > I've written a test program that uses the simplest form of search, 
> > TermQuery and measure the time it takes to search a term in 
> a field on 
> > indices of various sizes.
> >
> > The result is a very linear growth of search time vs the 
> index size in 
> > terms of # of Documents, not # of unique terms in that field.
> >
> > -kuro
> >
> > 
> ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message