incubator-lucy-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marvin Humphrey <>
Subject Re: [lucy-user] IO ponderings
Date Sat, 17 Sep 2011 19:47:48 GMT
On Sat, Sep 17, 2011 at 08:52:41AM +0200, goran kent wrote:
> I've been wondering (and I'll eventually get around to performing a
> comparative test sometime this weekend) about IO and search
> performance (ie, ignore OS caching).

Hmm, Lucy is actually designed to integrate with the system IO cache very
tightly!  We exploit mmap so that all Searchers are backed by the same IO
cache memory pages.  And if you have an Indexer going at the same time, the
new index data it just wrote is also in the IO cache, and so is available
immediately to a new Searcher.  Very little gets read into process RAM when
you open a Searcher.

"The OS is our JVM." - Lucy developer Nate Kurz.

> What's the biggest cause of search degradation when Lucy is chugging
> through it's on-disk index?
> Physically *finding* data (ie, seeking and thrashing around the disk),
> waiting for data to *transfer* from the disk to CPU?

Well, the projects I've been involved with have taken the approach that there
should always be enough RAM on the box to fit the necessary index files.  "RAM
is the new disk" as they say.

I can tell you that once an index is in RAM, we're CPU bound.  I can't provide
you with analysis about performance characteristics when the index is not yet
in RAM, though.  We don't like to be in that state for very long. :)

> I'm quite interested to know whether using an SSD where seek time and
> other latency issues are almost zero would dramatically improve search
> times.  I've seen vast improvements when using them in RDBMS', but
> this may not translate as well here.

I would speculate that with SSDs you'd get a more graceful performance
degradation as Lucy's RAM requirements start to exceed what the box can
provide.  But I have no numbers to back that up.

Marvin Humphrey

View raw message