lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen>
Subject Re: Are there any Lucene optimizations applicable to SSD?
Date Tue, 19 Aug 2008 09:29:54 GMT
On Tue, 2008-08-19 at 16:22 +0800, Cedric Ho wrote:

[Lucene on SSD]

> However it's still not good enough for our particular case. So I
> wonder if there are any tips for optimizing lucene performance on
> SSDs.

What aspect of performance do you find lacking? Is it searching or
indexing? While we've had stellar results for searches, indexing is just
so-so better than conventional harddisks.

As for optimizing towards SSDs, we've found that the CPU is the
bottleneck for us: The performance keeps climbing markedly for 1-5
threads on a 4 core system with a single 64GB SSD, nearly identical to
the same system with a RAID 0 of 4 * 64GB SSD.

> For example, I saw that Lucene's BufferedIndexInput class will read
> 1024bytes off the disk each time. This certainly make sense on hard
> disk because of the seek latency involved. But would it actually
> hinder performance on SSD?

SSD's still retrieve data in blocks, so my _guess_ is that the 1024
doesn't make much of a difference.

Which SSD did you choose?

> FYI, we were trying to fit an index about 20G in size into a single
> machine with 8G ram. And the searches we receive are vastly different.
> So it's not likely we can depends on the system's file cache to speed
> things up for us.

We've experimented with a 37GB index on a machine with the amount of RAM
varying from 3-24GB of RAM, primarily simple searches. After warmup
(1000 queries), with 8GB and dual core, the performance for SSD is in
the area of 200 queries/sec and rising, as opposed to 50 queries/sec and
rising for conventional harddisks (see the graph under "Warming up" at ).

For searches with SSD, the size of the disk cache doesn't affect
performance much, but the first 1000 queries or so aren't representative
at all, no matter if the index is in RAM, on SSD or conventional
harddisks. Of course YMMW.

Could you give some more information on the searches? What is a typical
query, what do you do with the result (e.g. iterate through Hits,
extracting fields)?

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message