Return-Path: Delivered-To: apmail-lucene-general-archive@www.apache.org Received: (qmail 61075 invoked from network); 6 Oct 2006 20:05:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 6 Oct 2006 20:05:31 -0000 Received: (qmail 97350 invoked by uid 500); 6 Oct 2006 20:05:29 -0000 Delivered-To: apmail-lucene-general-archive@lucene.apache.org Received: (qmail 97334 invoked by uid 500); 6 Oct 2006 20:05:29 -0000 Mailing-List: contact general-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@lucene.apache.org Delivered-To: mailing list general@lucene.apache.org Received: (qmail 97323 invoked by uid 99); 6 Oct 2006 20:05:29 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Oct 2006 13:05:29 -0700 X-ASF-Spam-Status: No, hits=0.0 required=5.0 tests= Received: from [69.5.78.143] ([69.5.78.143:53419] helo=smtp.ryley.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id 85/CA-24193-207B6254 for ; Fri, 06 Oct 2006 13:05:23 -0700 Received: from bigyellow (c-68-50-53-137.hsd1.md.comcast.net [68.50.53.137]) by smtp.ryley.com (8.12.11.20060308/8.12.10) with ESMTP id k96K5ErW004903 for ; Fri, 6 Oct 2006 16:05:19 -0400 Message-Id: <200610062005.k96K5ErW004903@smtp.ryley.com> From: "James" To: Subject: RE: Infrastructure for large Lucene index Date: Fri, 6 Oct 2006 16:05:17 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook, Build 11.0.6353 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2869 Thread-Index: AcbpdSW5F0pjwlYjSHiLAxlSa/txiwADQ/SA In-Reply-To: <20061006182748.8460.qmail@web55108.mail.re4.yahoo.com> X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Slava, We currently do this across many machines for http://www.FreePatentsOnline.com. Our indexes are, in aggregate across our various collections, even larger than you need. We use Remote ParalellMultiSearcher, with some custom modifications (and we are in the process of making more) to allow most robust handling of many processes at once and integration of the responses from various sub-indexes. This works fine on commodity hardware, and you will be IO bound, so get multiple drives in each machine. Out of curiosity, what project are you working on? That's a lot of hits! Sincerely, James Ryley, Ph.D. www.FreePatentsOnline.com > -----Original Message----- > From: Slava Imeshev [mailto:imeshev@yahoo.com] > Sent: Friday, October 06, 2006 2:28 PM > To: general@lucene.apache.org > Subject: Infrastructure for large Lucene index > > > I am dealing with pretty challenging task, so I thought it would be > a good idea to ask community before I re-invent any wheels of my own. > > I have a Lucene index that is going to grow to 100GB soon. This is > index going to be read very aggresively (10s of millions requests > per day) with some occasional updates (10 batches per day). > > The idea is to split load between multiple server nodes running Lucene > on *nix while accessing the same index that is shared across the network. > > I am wondering if it's a good idea and/or if there are any recommendations > regarding selecting/tweaking network configuration (software+hardware) > for an index of this size. > > Thank you. > > Slava Imeshev