lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Spencer, Dave" <d...@lumos.com>
Subject RE: better, more accurate, RAMDirectory benchmark - RE: my submission though it's no faster - RE: Converting a FSDirectory (on disk index) to a RAMDirectory
Date Wed, 27 Feb 2002 18:59:13 GMT

Good point.

I'm running on Win2k with 512MB. Decent cpu, approx 1GHz.


-----Original Message-----
From: Scott Ganyo [mailto:scott.ganyo@eTapestry.com]
Sent: Wednesday, February 27, 2002 10:27 AM
To: 'Lucene Developers List'
Subject: RE: better, more accurate, RAMDirectory benchmark - RE: my
submission though it's no faster - RE: Converting a FSDirectory (on disk
index) to a RAMDirectory


Just curious, what OS is this and how much RAM do you have?  Some OSes
(Solaris, for example) apparently keep an in-memory disk cache using
available RAM in the box.  Something like this would negate much of any
difference between the two runs as all the data would be in memory one
way
or another anyway...

> -----Original Message-----
> From: Spencer, Dave [mailto:dave@lumos.com]
> Sent: Wednesday, February 27, 2002 1:18 PM
> To: Lucene Developers List
> Subject: better, more accurate, RAMDirectory benchmark - RE: my
> submission though it's no faster - RE: Converting a 
> FSDirectory (on disk
> index) to a RAMDirectory
> 
> 
> Since the benchmark I ran yesterday had seemingly wrong results, and
> since the
> code was too complicated, I rewrote it into a simplier, more isolated
> benchmark.
> 
> Executive summary is that with FSDirectory I get an avg of 850ms/query
> (time to process a query)
> and with RAMDirectory it's 775ms/query, thus things are a bit faster
> (8%).
> 
> Details: 
> 
> VM Invoked as "java -Xverify:none  -ms32m -mx256m 
> 
> Index is same as yesterday, 140k docs, 90MB data on disk.
> 
> I looped 1000 times for the test and each cycle thru the loop
> picked one of 6 queries. Most of the queries had 2 words, while one
> had 6.
> 
> I didn't do any special transformation on the query i.e. I just called
> Query query = QueryParser.parse(srch, DFields.CONTENTS, stopAnalyzer)
> to form the Query obj.
> 
> Code attached for review at least, though I stripped out the array of
> queries since they're based
> on my index which I'm not attaching (90MB..).
> 
> On reflection I think we'll be able to conclude that while this
> benchmark does show
> RAMDirectory is faster, it's still not that much faster, and that may
> mean that it's still
> not stressing out the FSDirectory impl enough. It might be that a more
> valid benchmark would
> be multi-threaded and use more queries so as to hit more of the disk
> hopefully...
> 
> 
> 
> FSDirectory:
> 
> Startup:: 60ms
> free :: 31MB
> total:: 32MB
> min:: 280
> max:: 2303
> avg:: 853ms
> free :: 30MB
> total:: 32MB
> 
> RAMDirectory:
> 
> Startup:: 6759ms
> free :: 65MB
> total:: 158MB
> min:: 250
> max:: 2153
> avg:: 775ms
> free :: 74MB
> total:: 168MB
> 
> -----Original Message-----
> From: Spencer, Dave 
> Sent: Tuesday, February 26, 2002 4:08 PM
> To: Lucene Developers List
> Subject: my submission though it's no faster - RE: Converting a
> FSDirectory (on disk index) to a RAMDirectory
> 
> 
> I've attached a modified version of RAMDirectory that has an 
> additional,
> "copy" constructor for creating a RAMDirectory based on another,
> existing,
> Directory, presumably a FSDirectory (i.e. an on-disk directory).
> 
> Below Doug asked for 2 additional ctrs but I didn't add them since 
> the appropriate ctrs in FSDirectory didn't exist and I wasn't sure
> if I should add the 'boolean create' flags when the implicit
> FSDirectory.getDirectory
> call was made...and because I wanted to get this "out the door".
> 
> I based this on the last nightly release (2-26) which I just loaded.
> It compiles, and I ran the "ant test" target too, successfully.
> 
> I ran my own humble benchmark and it seemed that running with a
> RAMDirectory
> and a FSDirectory leads to equal times for searches, though the
> RAMDirectory
> case takes up more RAM (as expected) and takes longer to initialize
> (again, as expected).
> 
> The test runs against a database that takes 90MB on disk and 
> has 140,000
> entries.
> It applies my unpublished SubstringQuery on a couple of works and ends
> up with a query
> of 100 or so terms w/ boosting and checks against 3 fields. The query
> returns approx 400
> matches in both tests (thus a sanity check was made that both
> directories return the same
> # of matches).
> 
> Memory is measured twice, by calling into java.lang.Runtime,
> freeMemory() and totalMemory()
> (1) after the directory is created and
> (2) just before the test finishes.
> Before measuring memory I call System.gc() to try to get rid 
> of junk in
> the system.
> 
> The VM is invoked like this:
> 	java -Xverify:none -ms32m -mx256m
> The verify:none tells the vm not to perform sanity checks on the class
> files.
> Consequence is it starts up faster and runs fine if your tree is
> compiled.
> 
> The queries were run 25 times each in one run, and later 5 
> times each -
> results are for the
> last, shorter run.
> 
> 	startup	free/total	free/total    min/max/avg(ms)
> fs	10ms	      31/32mb	30/32         10064/10274/10114
> ram	6889ms	65/159mb	67/164        10124/10384/10192
> 
> So all the numbers are more or less as expected except for 
> the times at
> the end - they're
> almost identical which is kinda weird. I even tried rerunning the ram
> test and deleting
> the database after it started to "prove" that it's reading out of
> ram,and I get the same
> numbers [note: yes I mean 'delete', not 'rename', just in 
> case something
> funny could be happening].
> 
> At the moment I can't easily publish my benchmark code but 
> will do so if
> it's needed
> later this week.
> 
> I suggest that this version of RAMDirectory be added to the 
> src base as
> after all, the
> ctr itself is reasonable. I'd like to know if anyone has run w/
> RAMDirectory
> and proven that it's faster.
> 
> My conclusion from the tests I've ran is that FSDirectory 
> must be doing
> good buffering/reading
> such that an in-memory directory has no benefit.
> 
> PS
>   I really gotta hit send, but I have a feeling the benchmark 
> is invalid
> since I reran the
>   same query over and over again, and thus didn't stress out the
> filesystem i/o since
>   we all know lucene is well implemented and probably doesn't do that
> much i/o. Maybe I need
>   another pass where the benchmark cycles thru a number of 
> diff queries,
> thus the FSDirectory
>   should have to hit the disk more...
> 
> 
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Doug Cutting [mailto:DCutting@grandcentral.com]
> Sent: Thursday, February 21, 2002 1:33 PM
> To: 'Lucene Developers List'
> Subject: RE: Converting a FSDirectory (on disk index) to a 
> RAMDirectory
> 
> 
> > From: Spencer, Dave [mailto:dave@lumos.com]
> > 
> > Could anyone glance at this and verify that this code is correct.
> > Goal is to convert an existing, on-disk, index to a 
> > RAMDirectory, which presumably is purely in memory.
> 
> It looks right to me.  Did you test it?  Did it work?
> 
> > If the code is correct I'd suggest someone w/ CVS powers 
> adding it to
> > the source base - maybe a static method in  RAMDirectory itself.
> 
> How about a RAMDirectory constructor?  Since only generic Directory
> methods
> are required it could just be:
> 
>   public RAMDirectory(Directory dirToCopy) { ... }
> 
> and, as conveniences:
> 
>   public RAMDirectory(File f)   { this(new FSDirectory(f)); }
>   public RAMDirectory(String s) { this(new FSDirectory(s)); }
> 
> Doug
> 
> --
> To unsubscribe, e-mail:
> <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
> For additional commands, e-mail:
> <mailto:lucene-dev-help@jakarta.apache.org>
> 
> 

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message