lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Scaling out/up or a mix
Date Mon, 29 Jun 2009 09:41:33 GMT
On Sat, 2009-06-27 at 00:00 +0200, Marcus Herou wrote:
> We currently have about 90M documents and it is increasing rapidly so
> getting into the G+ document range is not going to be too far away.

We've performed fairly extensive tests regarding hardware for searches
and some minor tests on hardware for indexing. The tests were primarily
with regard to cores, RAM and storage (so no focus on CPU-speed or
bus-speed). Our "standard" index was 37GB with 9 million documents,
although we did try our hands with running 40 million documents on a
single machine. 

You might want to take a look at some unordered notes and graphs from
our tests: http://wiki.statsbiblioteket.dk/summa/Hardware

> 2. What is the most important hardware aspect when it comes to searching
> documents in my setup ? (result-set is limited to return only the top 10
> matches with page handling)
> 2.1 Is it disk read throughput ? (sequential or random-io ?)
> 2.2 Is it RAM ?
> 2.3 Is is CPU ?

For searches, random access is king, so go for Solid State Drives. 
As there is a lot of crap our there, be sure to read some reviews. 
The Intel X25 seems like a safe bet right now.

While not quite on par with holding the full index in RAM, SSDs comes
quite close (744 searches/second vs. 951 searches/second in one of our
tests with a standard RAMDirectory). The same test for 2 * 15.000 RPM
conventional harddisks in RAID 1 gave us ~200 searches/second. This is
of course highly dependent of the index.

As opposed to conventional harddisks, SSDs aren't nearly as reliant on
RAM for caching. On the other hand, SSDs are capable of serving larger
indexes than conventional harddisks and as such, more RAM will be needed
for the JVM with the Lucene searcher.

Our pick for the 50 million documents, 150-200GB of indexes per machine
range was 4 core Intel Xeons, 16GB RAM, 4*64GB SSDs for the index
(RAID0ing them does not change the speed significantly, we just do it to
get a single volume) and conventional harddisks for storage.

Just as Eric Bowman discovered, processing power easily becomes the
bottleneck when switching for SSDs. This happened for us too and
triggered a great deal of profiling (VisualVM is free, very_ easy to use
and helps tremendously with this) to pinpoint where the CPUs used their
energy.

Regards,
Toke Eskildsen


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message