Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 40245 invoked from network); 23 Oct 2009 18:13:22 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 23 Oct 2009 18:13:22 -0000 Received: (qmail 30517 invoked by uid 500); 23 Oct 2009 18:13:19 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 30444 invoked by uid 500); 23 Oct 2009 18:13:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 30434 invoked by uid 99); 23 Oct 2009 18:13:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 18:13:19 +0000 X-ASF-Spam-Status: No, hits=-2.6 required=5.0 tests=BAYES_00,HTML_MESSAGE X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.210.192] (HELO mail-yx0-f192.google.com) (209.85.210.192) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Oct 2009 18:13:16 +0000 Received: by yxe30 with SMTP id 30so12302520yxe.29 for ; Fri, 23 Oct 2009 11:12:55 -0700 (PDT) MIME-Version: 1.0 Received: by 10.150.87.28 with SMTP id k28mr18748608ybb.275.1256321574771; Fri, 23 Oct 2009 11:12:54 -0700 (PDT) In-Reply-To: <1256301220.7373.1164.camel@pc286> References: <4b124c310910222239k7d1de1c1q8cb7c54165ad2f89@mail.gmail.com> <4b124c310910222349n6906f9dcw4491949caf4d26c7@mail.gmail.com> <1256301220.7373.1164.camel@pc286> Date: Fri, 23 Oct 2009 15:12:54 -0300 Message-ID: Subject: Re: Maximum index file size From: Felipe Lobo To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=000e0cd5f15c301e2204769e2980 --000e0cd5f15c301e2204769e2980 Content-Type: text/plain; charset=ISO-8859-1 Hi, interesting discussion. Supose my index now has 1 TB. I splitted into 16 hds (65GB per hd) in the same machine with 16 cores. Use parallelmultisearch it's a good idea for this structure?? Results wil be fast?? Is there a better solution for this structure? thanks, On Fri, Oct 23, 2009 at 9:33 AM, Toke Eskildsen wrote: > On Fri, 2009-10-23 at 08:49 +0200, Jake Mannix wrote: > > One of the big problems you'll run into with this index size is that > > you'll never have enough RAM to give your OS's IO cache enough room to > keep > > much of this index in memory, so you're going to be seeking in this > monster > > file a lot. [...] > > Solid State Drives helps a lot in this aspect. We've done experiments > with a 40GB index and adjustments of the amount of RAM available for > file cache. We observed that search-speed using SSD's weren't near as > susceptible to cache-size as conventional harddisks. > > Some quick and fairly unstructured notes on our observations: > http://wiki.statsbiblioteket.dk/summa/Hardware > > > [...] > > This may be mitigated by using really fast disks, possibly, which is yet > > another reason why you'll need to do some performance profiling on a > > variety of sizes with similar-to-production data sets. > > For our setup, a switch from conventional harddisks to SSDs moved the > bottleneck from I/O to CPU/RAM. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > -- Felipe Lobo www.jusbrasil.com.br --000e0cd5f15c301e2204769e2980--