lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hager <jha...@gmail.com>
Subject Re: Index in RAM - is it realy worthy?
Date Wed, 24 Nov 2004 22:26:40 GMT
When comparing RAMDirectory and FSDirectory it is important to mention
what OS you are using.  When using linux it will cache the most recent
disk access in memory.  Here is a good article that describes its
strategy: http://forums.gentoo.org/viewtopic.php?t=175419

The 2% difference you are seeing is the memory copy.  With other OSes
you may see a speed up when using the RAMDirectory, because not all
OSes contain a disk cache in memory and must access the disk to read
the index.

Another consideration is there is currently a 2GB limitation with the
size of the RAMDirectory.  Indexes over 2GB causes a overflow in the
int used to create the buffer.  [see int len = (int) is.length(); in
RamDirectory]

I ended up using RAM directory for a very different reason.  The index
is 1 to 2MB and is rebuilt every few hours.  It takes 3 to 4 minutes
to query the database and rebuild the index.  But the search should be
available 100% of the time.  Since the index is so small I do the
following:

on server startup:
- look for semaphore, if it is there delete the index
- if there is no index, build it to FSdirectory
- load the index from FSDirectory into RAMDirectory

on reindex:
- create semaphore
- rebuild index to FSDirectory
- delete semaphore
- load index from FSDirecttory into RAMDirectory

to search:
- search the RAMDirectory

RAMDirectory could be replaced by a regular FSDirectory, but it seemed
silly to copy the index from disk to disk, when it ultimately needs to
be in memory.

FSDirectory could be replaced by a RAMDirectory, but this means that
it would take the server 3 to 4 minutes longer to startup every time. 
By persisting the index, this time would only be necessary if indexing
was interrupted.

Jonathan

On Mon, 22 Nov 2004 12:39:07 -0800, Kevin A. Burton
<burton@newsmonster.org> wrote:
> Otis Gospodnetic wrote:
> 
> >For the Lucene book I wrote some test cases that compare FSDirectory
> >and RAMDirectory.  What I found was that with certain settings
> >FSDirectory was almost as fast as RAMDirectory.  Personally, I would
> >push FSDirectory and hope that the OS and the Filesystem do their share
> >of work and caching for me before looking for ways to optimize my code.
> >
> >
> Yes... I performed the same benchmark and in my situation RAMDirectory
> for searches was about 2% slower.
> 
> I'm willing to bet that it has to do with the fact that its a Hashtable
> and not a HashMap (which isn't synchronized).
> 
> Also adding a constructor for the term size could make loading a
> RAMDirectory faster since you could prevent rehash.
> 
> If you're on a modern machine your filesystme cache will end up
> buffering your disk anyway which I'm sure was happening in my situation.
> 
> Kevin
> 
> --
> 
> Use Rojo (RSS/Atom aggregator).  Visit http://rojo.com. Ask me for an
> invite!  Also see irc.freenode.net #rojo if you want to chat.
> 
> Rojo is Hiring! - http://www.rojonetworks.com/JobsAtRojo.html
> 
> If you're interested in RSS, Weblogs, Social Networking, etc... then you
> should work for Rojo!  If you recommend someone and we hire them you'll
> get a free iPod!
> 
> Kevin A. Burton, Location - San Francisco, CA
>        AIM/YIM - sfburtonator,  Web - http://peerfear.org/
> GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
> 
>

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message