Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73623 invoked from network); 14 Oct 2006 01:42:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 14 Oct 2006 01:42:07 -0000 Received: (qmail 34566 invoked by uid 500); 14 Oct 2006 01:42:01 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 34537 invoked by uid 500); 14 Oct 2006 01:42:01 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 34526 invoked by uid 99); 14 Oct 2006 01:42:01 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Oct 2006 18:42:01 -0700 X-ASF-Spam-Status: No, hits=2.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of markrmiller@gmail.com designates 64.233.166.183 as permitted sender) Received: from [64.233.166.183] (HELO py-out-1112.google.com) (64.233.166.183) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 13 Oct 2006 18:42:00 -0700 Received: by py-out-1112.google.com with SMTP id i49so1000778pyi for ; Fri, 13 Oct 2006 18:41:39 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=cd1a4brykDMr56Iqtf8qW0FczCN3B2kofpx5WIMfhD9WOs8gOcdEZlvMxVyHpud8RuvlwFJtnhAHdkUDMb7XewcqXSqvVnhtZRgHlerkY1eIi6+ukZWzMn+zQnLEa5eVZPY1qh6XSqTZtB6oV56acU06AwEAL6Q7OwrZwyeyBWk= Received: by 10.35.132.13 with SMTP id j13mr6959018pyn; Fri, 13 Oct 2006 18:41:38 -0700 (PDT) Received: by 10.35.124.6 with HTTP; Fri, 13 Oct 2006 18:41:38 -0700 (PDT) Message-ID: Date: Fri, 13 Oct 2006 21:41:38 -0400 From: "Mark Miller" To: java-user@lucene.apache.org Subject: Re: Large index question In-Reply-To: <55F9EF953014514EB6BEE1F5198CEEB44897CD@exchange.slc.mainstreamdata.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_54006_25247183.1160790098842" References: <55F9EF953014514EB6BEE1F5198CEEB44897CD@exchange.slc.mainstreamdata.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_54006_25247183.1160790098842 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline I recently played around with a 2 million doc index of docs that averaged between 2-10k. The system had 4 gig of ram and a 3 gig dual core proc (not using a parallel searcher to take advantage of the extra core)...pretty beefy, but with 4 times the docs your talking about. I didn't see a query that took over a second without a sort. A similar setup on a single core 3200+ AMD 64 with a gig of ram was also blazingly fast (no sorts involved again). - Mark On 10/12/06, Scott Smith wrote: > > Supposed I want to index 500,000 documents (average document size is > 4kBs). Let's assume I create a single index and that the index is > static (I'm not going to add any new documents to it). I would guess > the index would be around 2GB. > > > > Now, I do searches against this on a somewhat beefy machine (2GB RAM, > Core 2 Duo, Windows XP). Does anyone have any idea what kinds of search > times I can expect for moderately complicated searches (several sets of > keywords against several fields)? Are there things I can do to increase > search performance? For example, does Lucene like lots of RAM, lots of > CPU, faster HD, all of the above? Am I better splitting the index file > into 2 (N?) versions and search on multiple indexes simultaneously? > > > > Anyone have any thoughts about this? > > > > Scott > > > > > ------=_Part_54006_25247183.1160790098842--