lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Philippe Laflamme <plafla...@konova.com>
Subject Re: The best way forward
Date Fri, 31 Oct 2003 15:43:08 GMT
Having more RAM does not necessarily mean you can use it in your
process. Keep in mind that a Xeon is a 32 bit x86 architecture, hence
can only physically address 4GB of RAM.

This means that theoretically a process cannot access more than 4GB (all
of them accounted for can add up to 16GB and more due to swap).

I say theoretically because there are other limitations. Last I heard,
Linux can only access 2GB of RAM per process. This is even worse for a
Java VM since it needs to allocate its memory in one chunk. 

This Java VM limitation makes the available memory even lower: look at
/proc/<pid>/maps of a Java process, you'll see that dynamic libraries
create holes in the memory map that a VM cannot access. A Java VM on
Linux is limited to approximatly 1.5GB. Try it: java -Xmx2000m will fail
on a 32 bit Linux machine. There might be some kernel pathes that I am
not aware of though.

If your 6M docs take up more than what you can access in one process,
you'll have to split up your processing into multiple VMs. Each VM could
load a RAMDirectory index that fits into the limitations. Then you'd
have another process that could access these "distributed" indexes.

Yes 16GB of RAM is defenitly fun... on a 64 bit architecture.

Phil

On Fri, 2003-10-31 at 08:23, Otis Gospodnetic wrote:
> Wow, with 16GB RAM, I would definitely load the index into RAM.  You
> can use RAMDirectory(Directory) constructor for that.
> 
> As for RAMDrives..... I have no experience with those, but I have heard
> of some people using ramfs under Linux.  Ramfs is a memory based
> filesystem. Mount it and you have  it.  Unmount it and it is gone.
> 
> Otis
> 
> 
> --- jt oob <jt2oob@yahoo.co.uk> wrote:
> > Hi,
> > I am currently indexing around 6 million text
> > documents using lucene.
> > 
> > We have a new server arriving in the next few weeks
> > which the queries will be run on. With the following
> > stats: Dell 6650 - 4 x Xeon HT CPU's, 16 GB RAM, 36GB
> > SCSI Ultra160 Hdd. (connected to 1.5TB IDE RAID with
> > actual source documents)
> > 
> > What is the best strategy for fast searches?
> > Do I need to write a server which holds the indexes in
> > a RAMDirectory?
> > 
> > When should RAMDirectories generally be used? I have
> > read several articles saying that RAMDrives under
> > linux
> > are rarely a good idea, but am not sure on how to
> > interpret this in the context of lucene and
> > RAMDirectories.
> > 
> > I have looked over the documentation I have found on
> > the lucene web site - hope i haven't missed something.
> > 
> > Is there a general guide to building large search
> > engines with lucene? I am very new to the whole field.
> > 
> > Thanks for the great software!
> > jt
> 
> 
> __________________________________
> Do you Yahoo!?
> Exclusive Video Premiere - Britney Spears
> http://launch.yahoo.com/promos/britneyspears/
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message