lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Spencer, Dave" <>
Subject better, more accurate, RAMDirectory benchmark - RE: my submission though it's no faster - RE: Converting a FSDirectory (on disk index) to a RAMDirectory
Date Wed, 27 Feb 2002 18:17:55 GMT
Since the benchmark I ran yesterday had seemingly wrong results, and
since the
code was too complicated, I rewrote it into a simplier, more isolated

Executive summary is that with FSDirectory I get an avg of 850ms/query
(time to process a query)
and with RAMDirectory it's 775ms/query, thus things are a bit faster


VM Invoked as "java -Xverify:none  -ms32m -mx256m 

Index is same as yesterday, 140k docs, 90MB data on disk.

I looped 1000 times for the test and each cycle thru the loop
picked one of 6 queries. Most of the queries had 2 words, while one
had 6.

I didn't do any special transformation on the query i.e. I just called
Query query = QueryParser.parse(srch, DFields.CONTENTS, stopAnalyzer)
to form the Query obj.

Code attached for review at least, though I stripped out the array of
queries since they're based
on my index which I'm not attaching (90MB..).

On reflection I think we'll be able to conclude that while this
benchmark does show
RAMDirectory is faster, it's still not that much faster, and that may
mean that it's still
not stressing out the FSDirectory impl enough. It might be that a more
valid benchmark would
be multi-threaded and use more queries so as to hit more of the disk


Startup:: 60ms
free :: 31MB
total:: 32MB
min:: 280
max:: 2303
avg:: 853ms
free :: 30MB
total:: 32MB


Startup:: 6759ms
free :: 65MB
total:: 158MB
min:: 250
max:: 2153
avg:: 775ms
free :: 74MB
total:: 168MB

-----Original Message-----
From: Spencer, Dave 
Sent: Tuesday, February 26, 2002 4:08 PM
To: Lucene Developers List
Subject: my submission though it's no faster - RE: Converting a
FSDirectory (on disk index) to a RAMDirectory

I've attached a modified version of RAMDirectory that has an additional,
"copy" constructor for creating a RAMDirectory based on another,
Directory, presumably a FSDirectory (i.e. an on-disk directory).

Below Doug asked for 2 additional ctrs but I didn't add them since 
the appropriate ctrs in FSDirectory didn't exist and I wasn't sure
if I should add the 'boolean create' flags when the implicit
call was made...and because I wanted to get this "out the door".

I based this on the last nightly release (2-26) which I just loaded.
It compiles, and I ran the "ant test" target too, successfully.

I ran my own humble benchmark and it seemed that running with a
and a FSDirectory leads to equal times for searches, though the
case takes up more RAM (as expected) and takes longer to initialize
(again, as expected).

The test runs against a database that takes 90MB on disk and has 140,000
It applies my unpublished SubstringQuery on a couple of works and ends
up with a query
of 100 or so terms w/ boosting and checks against 3 fields. The query
returns approx 400
matches in both tests (thus a sanity check was made that both
directories return the same
# of matches).

Memory is measured twice, by calling into java.lang.Runtime,
freeMemory() and totalMemory()
(1) after the directory is created and
(2) just before the test finishes.
Before measuring memory I call System.gc() to try to get rid of junk in
the system.

The VM is invoked like this:
	java -Xverify:none -ms32m -mx256m
The verify:none tells the vm not to perform sanity checks on the class
Consequence is it starts up faster and runs fine if your tree is

The queries were run 25 times each in one run, and later 5 times each -
results are for the
last, shorter run.

	startup	free/total	free/total    min/max/avg(ms)
fs	10ms	      31/32mb	30/32         10064/10274/10114
ram	6889ms	65/159mb	67/164        10124/10384/10192

So all the numbers are more or less as expected except for the times at
the end - they're
almost identical which is kinda weird. I even tried rerunning the ram
test and deleting
the database after it started to "prove" that it's reading out of
ram,and I get the same
numbers [note: yes I mean 'delete', not 'rename', just in case something
funny could be happening].

At the moment I can't easily publish my benchmark code but will do so if
it's needed
later this week.

I suggest that this version of RAMDirectory be added to the src base as
after all, the
ctr itself is reasonable. I'd like to know if anyone has run w/
and proven that it's faster.

My conclusion from the tests I've ran is that FSDirectory must be doing
good buffering/reading
such that an in-memory directory has no benefit.

  I really gotta hit send, but I have a feeling the benchmark is invalid
since I reran the
  same query over and over again, and thus didn't stress out the
filesystem i/o since
  we all know lucene is well implemented and probably doesn't do that
much i/o. Maybe I need
  another pass where the benchmark cycles thru a number of diff queries,
thus the FSDirectory
  should have to hit the disk more...

-----Original Message-----
From: Doug Cutting []
Sent: Thursday, February 21, 2002 1:33 PM
To: 'Lucene Developers List'
Subject: RE: Converting a FSDirectory (on disk index) to a RAMDirectory

> From: Spencer, Dave []
> Could anyone glance at this and verify that this code is correct.
> Goal is to convert an existing, on-disk, index to a 
> RAMDirectory, which presumably is purely in memory.

It looks right to me.  Did you test it?  Did it work?

> If the code is correct I'd suggest someone w/ CVS powers adding it to
> the source base - maybe a static method in  RAMDirectory itself.

How about a RAMDirectory constructor?  Since only generic Directory
are required it could just be:

  public RAMDirectory(Directory dirToCopy) { ... }

and, as conveniences:

  public RAMDirectory(File f)   { this(new FSDirectory(f)); }
  public RAMDirectory(String s) { this(new FSDirectory(s)); }


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message