Return-Path: Delivered-To: apmail-jakarta-lucene-user-archive@www.apache.org Received: (qmail 39709 invoked from network); 31 Oct 2003 15:43:09 -0000 Received: from daedalus.apache.org (HELO mail.apache.org) (208.185.179.12) by minotaur-2.apache.org with SMTP; 31 Oct 2003 15:43:09 -0000 Received: (qmail 54250 invoked by uid 500); 31 Oct 2003 15:43:00 -0000 Delivered-To: apmail-jakarta-lucene-user-archive@jakarta.apache.org Received: (qmail 54173 invoked by uid 500); 31 Oct 2003 15:42:59 -0000 Mailing-List: contact lucene-user-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Users List" Reply-To: "Lucene Users List" Delivered-To: mailing list lucene-user@jakarta.apache.org Received: (qmail 54159 invoked from network); 31 Oct 2003 15:42:59 -0000 Received: from unknown (HELO tomts13-srv.bellnexxia.net) (209.226.175.34) by daedalus.apache.org with SMTP; 31 Oct 2003 15:42:59 -0000 Received: from everest.konova.com ([67.69.200.122]) by tomts13-srv.bellnexxia.net (InterMail vM.5.01.06.05 201-253-122-130-105-20030824) with ESMTP id <20031031154301.HMW1032.tomts13-srv.bellnexxia.net@everest.konova.com> for ; Fri, 31 Oct 2003 10:43:01 -0500 Received: from 6-allhosts (IDENT:root@everest.konova.com [10.10.10.2]) by everest.konova.com (8.11.6/8.11.6) with ESMTP id h9VFgu325837 for ; Fri, 31 Oct 2003 10:42:56 -0500 Subject: Re: The best way forward From: Philippe Laflamme To: Lucene Users List In-Reply-To: <20031031132318.78401.qmail@web12708.mail.yahoo.com> References: <20031031132318.78401.qmail@web12708.mail.yahoo.com> Content-Type: text/plain Message-Id: <1067614987.31059.17.camel@localhost> Mime-Version: 1.0 X-Mailer: Ximian Evolution 1.4.5 Date: Fri, 31 Oct 2003 10:43:08 -0500 Content-Transfer-Encoding: 7bit X-MailScanner: Found to be clean X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Having more RAM does not necessarily mean you can use it in your process. Keep in mind that a Xeon is a 32 bit x86 architecture, hence can only physically address 4GB of RAM. This means that theoretically a process cannot access more than 4GB (all of them accounted for can add up to 16GB and more due to swap). I say theoretically because there are other limitations. Last I heard, Linux can only access 2GB of RAM per process. This is even worse for a Java VM since it needs to allocate its memory in one chunk. This Java VM limitation makes the available memory even lower: look at /proc//maps of a Java process, you'll see that dynamic libraries create holes in the memory map that a VM cannot access. A Java VM on Linux is limited to approximatly 1.5GB. Try it: java -Xmx2000m will fail on a 32 bit Linux machine. There might be some kernel pathes that I am not aware of though. If your 6M docs take up more than what you can access in one process, you'll have to split up your processing into multiple VMs. Each VM could load a RAMDirectory index that fits into the limitations. Then you'd have another process that could access these "distributed" indexes. Yes 16GB of RAM is defenitly fun... on a 64 bit architecture. Phil On Fri, 2003-10-31 at 08:23, Otis Gospodnetic wrote: > Wow, with 16GB RAM, I would definitely load the index into RAM. You > can use RAMDirectory(Directory) constructor for that. > > As for RAMDrives..... I have no experience with those, but I have heard > of some people using ramfs under Linux. Ramfs is a memory based > filesystem. Mount it and you have it. Unmount it and it is gone. > > Otis > > > --- jt oob wrote: > > Hi, > > I am currently indexing around 6 million text > > documents using lucene. > > > > We have a new server arriving in the next few weeks > > which the queries will be run on. With the following > > stats: Dell 6650 - 4 x Xeon HT CPU's, 16 GB RAM, 36GB > > SCSI Ultra160 Hdd. (connected to 1.5TB IDE RAID with > > actual source documents) > > > > What is the best strategy for fast searches? > > Do I need to write a server which holds the indexes in > > a RAMDirectory? > > > > When should RAMDirectories generally be used? I have > > read several articles saying that RAMDrives under > > linux > > are rarely a good idea, but am not sure on how to > > interpret this in the context of lucene and > > RAMDirectories. > > > > I have looked over the documentation I have found on > > the lucene web site - hope i haven't missed something. > > > > Is there a general guide to building large search > > engines with lucene? I am very new to the whole field. > > > > Thanks for the great software! > > jt > > > __________________________________ > Do you Yahoo!? > Exclusive Video Premiere - Britney Spears > http://launch.yahoo.com/promos/britneyspears/ > > --------------------------------------------------------------------- > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org > For additional commands, e-mail: lucene-user-help@jakarta.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-user-help@jakarta.apache.org