Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 44135 invoked from network); 4 Feb 2008 00:47:54 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Feb 2008 00:47:54 -0000 Received: (qmail 23939 invoked by uid 500); 4 Feb 2008 00:47:40 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 23897 invoked by uid 500); 4 Feb 2008 00:47:40 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 23886 invoked by uid 99); 4 Feb 2008 00:47:40 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 03 Feb 2008 16:47:40 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of jake.mannix@gmail.com designates 72.14.246.249 as permitted sender) Received: from [72.14.246.249] (HELO ag-out-0708.google.com) (72.14.246.249) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 Feb 2008 00:47:12 +0000 Received: by ag-out-0708.google.com with SMTP id 35so755235aga.11 for ; Sun, 03 Feb 2008 16:47:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=P9+9F5U1yz2GkevZiXHcAlvaVx720GyKhknvrBmbllU=; b=HaLC+L1ITyKt9XoK4F5k0mJFkpKqiT+8qlCasJ+A6zn8Ap599FxXk5h0g3w5qTqT8JocK6cfvNk+HTmfirvcgRT9ol8YD1Sv6hbz+pRsN8Hbn8swesOMm+Plrltvwkvg8acrCeSg6u2rtHf/cbQQmGxlH9bVgEA0Ln3pPF4CW7w= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=uCpSfRMmbQl/xUOjmza8KRp26gX0Dyxmxj81lvYjwWKRrJr7aKpJAATzalmToEJLAZIby/kQiS6fXOnBQFo1sh3ADxzVcRkV/CL2vDQK0rWJXDarq3DbhbuKrT9t2AxU9D8JB1qMtsIzr+UNlk+jRCzEF+HLOkWj1QnOSnGwyG0= Received: by 10.100.209.11 with SMTP id h11mr13733559ang.52.1202085119470; Sun, 03 Feb 2008 16:31:59 -0800 (PST) Received: by 10.100.43.15 with HTTP; Sun, 3 Feb 2008 16:31:59 -0800 (PST) Message-ID: <4b124c310802031631x11fc3d1ah82f6d6b506b1e4b5@mail.gmail.com> Date: Sun, 3 Feb 2008 16:31:59 -0800 From: "Jake Mannix" To: java-user@lucene.apache.org Subject: Re: Indexing Speed: 2.3 vs 2.2 (real world numbers) In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_3131_6914601.1202085119461" References: <4b124c310802031157l2e7790b9oe1608beb6c5a6dd6@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_3131_6914601.1202085119461 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline Yeah, I should have mentioned - this was merely with a jar replacement, we haven't gotten around to doing fun 2.3-related stuff like making sure our domain-specific tokenizers use the next(Token), as well as making sure set all of our buffersizes by RAM used. We tried multithreading the process, as we have a multi-core, multi-disk architecture, but for some reason we never saw more than 99% (of one core) cpu usage during indexing, as if some internal synchronization was getting hit... I should try it again through the profiler and see if I can pinpoint where it was getting tripped up. On the other hand, I'm not sure if we *need* faster than 26 minute indexing, so once we're sure we can move up to 2.3 for production, that may just solve our indexing perf issues. Now if I can just figure out how to speed up our query performance too, I'll be in an even *better* mood. :) -jake On Feb 3, 2008 2:11 PM, Michael McCandless wrote: > > Awesome! We are glad to hear that :) > > You might be able to make it even faster with the steps here: > > http://wiki.apache.org/lucene-java/ImproveIndexingSpeed > > Mike > > Jake Mannix wrote: > > > Hello all, > > I know you lucene devs did a lot of work on indexing performance > > in 2.3, > > and I just tested it out last thursday, so I thought I'd let you > > know how it > > fared: > > > > On a 2.17 million document index, a recent test gave indexing > > time to be: > > > > * lucene 2.2: 4.83 hours > > * lucene 2.3: 26 minutes > > > > About a factor of 11 speedup. Holy smokes! Great work folks. > > > > > > -jake > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_3131_6914601.1202085119461--