lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject RE: Hardware Specs Question
Date Mon, 06 Sep 2010 19:35:13 GMT
From: Dennis Gearon [gearond@sbcglobal.net]:
> I wouldn't have thought that CPU was a big deal with the speed/cores of CPU's
> continuously growing according to Moore's law and the change in Disk Speed
> barely changine 50% in 15 years. Must have a lot to do with caching.

I am not sure I follow you? When seek times are suddenly a 100 times faster (slight exaggeration,
but only slight) why wouldn't it cause the bottleneck to move? Yes, CPU's has increased tremendously
in speed, but so has our processing needs. Lucene (and by extension Solr) was made with long
seek times in mind and looking at the current marked, it makes sense to continue supporting
this for some years. If the software was optimized for sub-ms seek times, it might lower CPU
usage or at the very least lower the need for caching (internal as well as external).

> What size indexes are you working with?

Around 40GB for our primary index. 9 million documents, AFAIR.

> Are you saying you can get the whole thing in memory?

No. For that test we had to reduce the index to 14GB on our 24GB test machine with Lucene's
RAMDirectory. In order to avoid the "everything is cached and thus everything is the same
speed"-problem, we lowered the amount of available memory to 3GB when we measured harddisk
& SSD speed against the 14GB index. The Cliff notes is harddisks 200 raw queries/second,
SSDs 774 q/sec and RAM 952 q/s, but as always it is not so simple to extract a single number
for performance when warm up and caching comes into play. Let me be quick to add that this
was with Lucene + custom code, not with Solr.

> That would negate almost any disk benefits.

That depends very much on your setup. It takes a fair amount of time to copy 14GB from storage
into RAM so an index fully in RAM would either be very static or require some logic to handle
updates and sync data in case of outages. I know there's some interesting work being done
with this, but as SSDs are a lot cheaper than RAM and fulfill our needs, it is not something
we pursue.

Mime
View raw message