Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 75816 invoked from network); 30 Jun 2009 08:49:34 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 30 Jun 2009 08:49:34 -0000 Received: (qmail 36458 invoked by uid 500); 30 Jun 2009 08:49:43 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 36399 invoked by uid 500); 30 Jun 2009 08:49:43 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 36389 invoked by uid 99); 30 Jun 2009 08:49:43 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 08:49:43 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [130.225.24.68] (HELO sbexch03.sb.statsbiblioteket.dk) (130.225.24.68) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 30 Jun 2009 08:49:35 +0000 Received: from [172.18.253.199] (172.18.253.199) by sbexch03.sb.statsbiblioteket.dk (130.225.24.68) with Microsoft SMTP Server id 8.1.375.2; Tue, 30 Jun 2009 10:49:13 +0200 Subject: Re: Scaling out/up or a mix From: Toke Eskildsen Reply-To: te@statsbiblioteket.dk To: "java-user@lucene.apache.org" In-Reply-To: <7e536b1f0906290047g14322a5bm55f6740090fd32d2@mail.gmail.com> References: <7e536b1f0906261500m297efb0cv107e2b2c5cd94ac3@mail.gmail.com> <7e536b1f0906281413m276606ccyca58036de05708b6@mail.gmail.com> <4A4864E7.3070609@boboco.ie> <7e536b1f0906290047g14322a5bm55f6740090fd32d2@mail.gmail.com> Content-Type: text/plain Organization: Statsbiblioteket Date: Tue, 30 Jun 2009 10:49:12 +0200 Message-ID: <1246351752.3464.18.camel@pc286> MIME-Version: 1.0 X-Mailer: Evolution 2.22.3.1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org On Mon, 2009-06-29 at 09:47 +0200, Marcus Herou wrote: > Index size(and growing): 16Gx8 = 128G > Doc size (data): 20k > Num docs: 90M > Num users: Few hundred but most critical is that the admin staff which is > using the index all day long. > Query types: Example: title:"Iphone" OR description:"Iphone" sorted by > publishedDate... = Very simple, no fuzzy searches etc. However since the > dataset is large it will consume memory on sorting I guess. > > Could not one draw any conclusions about best-practice in terms of hardware > given the above "specs" ? Can you give us an estimate of the number of concurrent searches in prime time and in what range a satisfactory response time would be? Going for a fully RAM-based search on a corpus of this size would mean that each machine holds about 30GB of index (taken from your hardware suggestion). I would expect that such a machine would be able to serve something like 500-1000 searches/second (highly dependent on the index and the searches, but what you're describing sounds simple enough) if we just measure the raw search time and lookup of one or two fields for the first 20 hits. It that what you're aiming for? Wrapping in web services and such lowers the number of searches that can be performed, which makes the RAM-option even more expensive relative to a harddisk or SSD solution. > I mean it is very simple: Let's say someone gives me a budget of 50 000 USD > and I then want to get the most bang for the buck for my workload. I am a bit unclear on your overall goal. Do you expect the number of users to grow significantly? --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org