lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yonik Seeley <yo...@lucidimagination.com>
Subject Re: Lucene performance: is search time linear to the index size?
Date Thu, 18 Jun 2009 22:27:29 GMT
On Thu, Jun 18, 2009 at 3:54 PM, Teruhiko Kurosaka<Kuro@basistech.com> wrote:
> Because the number of hits was proportinoal to the number
> of Documents in the index in my previous test, I came
> to a wrong conclusion that the search time is proportional
> to the index size.  If I have only one Document that can
> matches with a query, the search time remains constant no
> matter how large the index is.

Right. An inverted index contains a list of documents that match each
term, so ignoring other overhead and effects, search time is
proportional to the number of documents matching the various clauses
of the query.

-Yonik
http://www.lucidimagination.com

> -kuro
>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Thursday, June 18, 2009 12:44 AM
>> To: java-user@lucene.apache.org
>> Subject: Re: Lucene performance: is search time linear to the
>> index size?
>>
>> Opening a searcher and doing the first query incurs a
>> significant amount of overhead, cache loading, etc. Inferring
>> search times relative to index size with a program like you
>> describe is unreliable.
>>
>> Try firing a few queries at the index without measuring,
>> *then* measure the time it takes for subsequent queries and
>> you'll get a much better picture of actual response time.
>>
>> The fact that a program that fires a single query at a newly
>> opened reader has near-linear performance isn't as surprising
>> as all that. I'd be more concerned if, say, queries 10
>> through 100 *on the same underlying reader* displayed this behavior.
>>
>> See:
>>
>> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed?highl
>> ight=(warming)
>>
>> especially the questions around:
>> *When measuring performance, disregard the first query
>>
>> Best
>> Erick
>> *
>> On Thu, Jun 18, 2009 at 12:49 AM, Teruhiko Kurosaka
>> <Kuro@basistech.com>wrote:
>>
>> > I've written a test program that uses the simplest form of search,
>> > TermQuery and measure the time it takes to search a term in
>> a field on
>> > indices of various sizes.
>> >
>> > The result is a very linear growth of search time vs the
>> index size in
>> > terms of # of Documents, not # of unique terms in that field.
>> >
>> > -kuro
>> >
>> >
>> ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>> >
>>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message