lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dror Matalon <d...@zapatec.com>
Subject Re: The best way forward
Date Fri, 31 Oct 2003 17:55:21 GMT
There was a recent discussion on the postgres PERFORMENCE mailing list
(check out their archive) about using over 4GM of ram on linux. AFAIR
the gist of it was that while a single process can't use more than 4G,
the kernel can and *will* use the additional memory for caching. 

So I'd recommend that you just use the 1.5GB for Lucene, run some kind
of automated tests, run vmstat/iostat and see if your disks are getting
hit.

That should be much less complicated that RAMDirectories, spliting JVMs,
etc.

Regards,

Dror

On Fri, Oct 31, 2003 at 10:43:08AM -0500, Philippe Laflamme wrote:
> Having more RAM does not necessarily mean you can use it in your
> process. Keep in mind that a Xeon is a 32 bit x86 architecture, hence
> can only physically address 4GB of RAM.
> 
> This means that theoretically a process cannot access more than 4GB (all
> of them accounted for can add up to 16GB and more due to swap).
> 
> I say theoretically because there are other limitations. Last I heard,
> Linux can only access 2GB of RAM per process. This is even worse for a
> Java VM since it needs to allocate its memory in one chunk. 
> 
> This Java VM limitation makes the available memory even lower: look at
> /proc/<pid>/maps of a Java process, you'll see that dynamic libraries
> create holes in the memory map that a VM cannot access. A Java VM on
> Linux is limited to approximatly 1.5GB. Try it: java -Xmx2000m will fail
> on a 32 bit Linux machine. There might be some kernel pathes that I am
> not aware of though.
> 
> If your 6M docs take up more than what you can access in one process,
> you'll have to split up your processing into multiple VMs. Each VM could
> load a RAMDirectory index that fits into the limitations. Then you'd
> have another process that could access these "distributed" indexes.
> 
> Yes 16GB of RAM is defenitly fun... on a 64 bit architecture.
> 
> Phil
> 
> On Fri, 2003-10-31 at 08:23, Otis Gospodnetic wrote:
> > Wow, with 16GB RAM, I would definitely load the index into RAM.  You
> > can use RAMDirectory(Directory) constructor for that.
> > 
> > As for RAMDrives..... I have no experience with those, but I have heard
> > of some people using ramfs under Linux.  Ramfs is a memory based
> > filesystem. Mount it and you have  it.  Unmount it and it is gone.
> > 
> > Otis
> > 
> > 
> > --- jt oob <jt2oob@yahoo.co.uk> wrote:
> > > Hi,
> > > I am currently indexing around 6 million text
> > > documents using lucene.
> > > 
> > > We have a new server arriving in the next few weeks
> > > which the queries will be run on. With the following
> > > stats: Dell 6650 - 4 x Xeon HT CPU's, 16 GB RAM, 36GB
> > > SCSI Ultra160 Hdd. (connected to 1.5TB IDE RAID with
> > > actual source documents)
> > > 
> > > What is the best strategy for fast searches?
> > > Do I need to write a server which holds the indexes in
> > > a RAMDirectory?
> > > 
> > > When should RAMDirectories generally be used? I have
> > > read several articles saying that RAMDrives under
> > > linux
> > > are rarely a good idea, but am not sure on how to
> > > interpret this in the context of lucene and
> > > RAMDirectories.
> > > 
> > > I have looked over the documentation I have found on
> > > the lucene web site - hope i haven't missed something.
> > > 
> > > Is there a general guide to building large search
> > > engines with lucene? I am very new to the whole field.
> > > 
> > > Thanks for the great software!
> > > jt
> > 
> > 
> > __________________________________
> > Do you Yahoo!?
> > Exclusive Video Premiere - Britney Spears
> > http://launch.yahoo.com/promos/britneyspears/
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> > For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message