Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 65262 invoked from network); 18 Jun 2009 22:09:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 18 Jun 2009 22:09:21 -0000 Received: (qmail 61425 invoked by uid 500); 18 Jun 2009 22:09:30 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 61361 invoked by uid 500); 18 Jun 2009 22:09:30 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 61351 invoked by uid 99); 18 Jun 2009 22:09:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 22:09:30 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [38.117.159.162] (HELO exchange.wgen.net) (38.117.159.162) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 18 Jun 2009 22:09:20 +0000 X-MimeOLE: Produced By Microsoft Exchange V6.5 x-cr-hashedpuzzle: AHqv A8SA Bfz0 Bmrg CBnV E8aZ FH5Z FQgJ FYIU GC3n Gr31 H2Xo IJ3M I9Pg JgGP Jsg+;1;agBhAHYAYQAtAHUAcwBlAHIAQABsAHUAYwBlAG4AZQAuAGEAcABhAGMAaABlAC4AbwByAGcA;Sosha1_v1;7;{54C8DD9D-881C-46C7-9993-7C12DA97D699};agBiAG8AbwB0AGgAQAB3AGcAZQBuAC4AbgBlAHQA;Thu, 18 Jun 2009 22:08:50 GMT;UgBFADoAIABMAHUAYwBlAG4AZQAgAHAAZQByAGYAbwByAG0AYQBuAGMAZQA6ACAAaQBzACAAcwBlAGEAcgBjAGgAIAB0AGkAbQBlACAAbABpAG4AZQBhAHIAIAB0AG8AIAB0AGgAZQAgAGkAbgBkAGUAeAAgAHMAaQB6AGUAPwA= MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable x-cr-puzzleid: {54C8DD9D-881C-46C7-9993-7C12DA97D699} Content-class: urn:content-classes:message Subject: RE: Lucene performance: is search time linear to the index size? Date: Thu, 18 Jun 2009 18:08:50 -0400 Message-ID: In-Reply-To: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Lucene performance: is search time linear to the index size? Thread-Index: Acnv6KYYcZohRmr7T1K52il7J9c7/gAZLLuwAATws2A= References: <359a92830906170909w2d18d0c4g780e0064841a8249@mail.gmail.com> <359a92830906180044s7dd5ae1aie65630045b5c8fea@mail.gmail.com> From: "Jay Booth" To: X-Virus-Checked: Checked by ClamAV on apache.org Are you fetching all of the results for your search? If so, you're actually measuring the time to pull n stored documents out of the index, not to search over an index of n documents. Which would of course be linear, most of your cost there will be the i/o to actually pull the document from disk, not the search time.=20 -----Original Message----- From: Teruhiko Kurosaka [mailto:Kuro@basistech.com]=20 Sent: Thursday, June 18, 2009 2:55 PM To: java-user@lucene.apache.org Subject: RE: Lucene performance: is search time linear to the index size? Erik, The way I test this program is by issuing 1000 queries and I have profiled it to make sure the start up cost is negligible. I ran a further test and discovered that the search time is actually proportional to the number of potential hits. (I am saying "potential hits" because I am limiting the number of hits by specifing "n" parameter in search method.) Because the number of hits was proportinoal to the number=20 of Documents in the index in my previous test, I came to a wrong conclusion that the search time is proportional=20 to the index size. If I have only one Document that can=20 matches with a query, the search time remains constant no=20 matter how large the index is. -kuro =20 > -----Original Message----- > From: Erick Erickson [mailto:erickerickson@gmail.com]=20 > Sent: Thursday, June 18, 2009 12:44 AM > To: java-user@lucene.apache.org > Subject: Re: Lucene performance: is search time linear to the=20 > index size? >=20 > Opening a searcher and doing the first query incurs a=20 > significant amount of overhead, cache loading, etc. Inferring=20 > search times relative to index size with a program like you=20 > describe is unreliable. >=20 > Try firing a few queries at the index without measuring,=20 > *then* measure the time it takes for subsequent queries and=20 > you'll get a much better picture of actual response time. >=20 > The fact that a program that fires a single query at a newly=20 > opened reader has near-linear performance isn't as surprising=20 > as all that. I'd be more concerned if, say, queries 10=20 > through 100 *on the same underlying reader* displayed this behavior. >=20 > See: >=20 > http://wiki.apache.org/lucene-java/ImproveSearchingSpeed?highl > ight=3D(warming) >=20 > especially the questions around: > *When measuring performance, disregard the first query >=20 > Best > Erick > * > On Thu, Jun 18, 2009 at 12:49 AM, Teruhiko Kurosaka=20 > wrote: >=20 > > I've written a test program that uses the simplest form of search,=20 > > TermQuery and measure the time it takes to search a term in=20 > a field on=20 > > indices of various sizes. > > > > The result is a very linear growth of search time vs the=20 > index size in=20 > > terms of # of Documents, not # of unique terms in that field. > > > > -kuro > > > >=20 > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > >=20 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org