lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cedric Ho" <>
Subject Re: Are there any Lucene optimizations applicable to SSD?
Date Tue, 19 Aug 2008 16:25:37 GMT

Thanks for the reply =)

> What aspect of performance do you find lacking? Is it searching or
> indexing? While we've had stellar results for searches, indexing is just
> so-so better than conventional harddisks.

Search response time. We used the search log from our production
system and test it with SSD. The results shows that 75% of queries
returns within 1 second, 90% returns in 2.5 seconds, the remaining 10%
ranges from 2.5 seconds to less than 100 seconds.

Total number of queries is ~40000, so about 10000 queries are kind of
slow, 1000 queries are very slow. But those 10% very slow queries are
not from the first 1000 queries. It's more or less evenly distributed.

> As for optimizing towards SSDs, we've found that the CPU is the
> bottleneck for us: The performance keeps climbing markedly for 1-5
> threads on a 4 core system with a single 64GB SSD, nearly identical to
> the same system with a RAID 0 of 4 * 64GB SSD.

I'd guess our CPU is fine because our test is probably different then
yours. We take one day's search log and emulate the exact search
queries to the Index at the exact time it happens in the search log.
So most of the time the CPU's idle except maybe for the peak hours.
(I'll remember to take a look at the CPU utilization during the test
in peak hour tomorrow.)

Your test keep running queries for as fast as it can get. And since
your queries can return so quickly, I'd guess that's probably why your
CPU gets hot =)

> Which SSD did you choose?

It's a single OCZ 64G SSD. We just got it yesterday. Is there a big
difference between different SSDs?

> Could you give some more information on the searches? What is a typical
> query, what do you do with the result (e.g. iterate through Hits,
> extracting fields)?

Our search queries are quite complicated sometimes.

All queries involves a Date Range Filter and a Publication Filter.
We've used WrappingCachingFilters for the Publication Filter for there
are only a limited number of combinations for this filter. For the
Date Range Filter we just let it run every time which seems to be
doing fine.

The queries also range from simple term query to phraseQueries to
nested spanQueries. Number of search terms > 10 is not uncommon.

Sorting by date or publication is the norm, sometimes also sort by score.

There are 3 returned fields, docId, date and publication, all of which
we retrieve through fieldCaches.

And we use this method to do the search:
TopFieldDocs query, Filter filter, int n, Sort sort)
where for the test run n=100

We are targeting to get >90% of queries to return under 1 sec. Of
course the more the better =)

Cedric Ho

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message