lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Solid State Drives vs. RAMDirectory
Date Tue, 15 Apr 2008 04:26:21 GMT
Toke, this is *super* juicy information, very useful and educational.  Please do put this on
the Wiki.
There doesn't seem to be a benchmarking page on the Wiki yet, so I suggest you go to http://wiki.apache.org/lucene-java/LuceneBenchmarks,
create that page, and put everything you want and can share there.

Thanks!
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Toke Eskildsen <te@statsbiblioteket.dk>
To: java-user@lucene.apache.org
Sent: Thursday, March 13, 2008 7:03:44 AM
Subject: Solid State Drives vs. RAMDirectory

Time for another dose of inspiration for investigating Solid State
Drives. And no, I don't get percentages from the chip manufacturers :-)

This time I'll argue that there's little gain in using a RAMDirectory
over SSDs, when performing searches. At least for our setting.


We've taken our production index of about 10 million documents / 37GB
and reduced it to 14GB by removing documents uniformly across the index.
A test with fairly simple searches were performed, using logged queries
from our production system (see the thread "Multiple Searchers" on this
mail list for details) and extracting the content of a stored field for
the first 20 hits for each search.

On a dual-core Xeon machine with 24GB of RAM, the full index can be
loaded into RAM with a RAMDirectory. The following is the average speed
over 340.000 queries. In the log names, t2 signifies 2 threads with a
shared searcher, t2u signifies 2 threads with separate searchers.

metis_RAM_24GB_i14_v23_t1_l23.log       530.0 q/sec
metis_RAM_24GB_i14_v23_t2_l23.log       888.2 q/sec
metis_RAM_24GB_i14_v23_t2u_l23.log      983.9 q/sec
metis_RAM_24GB_i14_v23_t3_l23.log       843.1 q/sec
metis_RAM_24GB_i14_v23_t3u_l23.log      996.1 q/sec
metis_RAM_24GB_i14_v23_t4_l23.log       869.8 q/sec
metis_RAM_24GB_i14_v23_t4u_l23.log      943.4 q/sec

As can be seen, the best performing configuration was 3 threads with
separate searchers. The time for loading the index into RAM was ignored.


Now for the interesting part: Reducing the amount of available RAM to
3GB and using SSDs instead.

metis_MTRONSSD_RAID0_3GB_i14_v23_t1_l23.log     433.7 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t2_l23.log     573.4 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t2u_l23.log    783.4 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t3_l23.log     459.7 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log    808.5 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t4_l23.log     455.3 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log    809.0 q/sec
metis_MTRONSSD_RAID0_3GB_i14_v23_t5_l23.log     454.4 q/sec

In comparison, the same test with 3GB of RAM on 15.000 RPM harddisks in
RAID 1 gave these numbers:

metis_15000RPM_RAID1_3GB_i14_v23_t1_l23.log     176.6 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t2_l23.log     188.6 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t2u_l23.log    247.1 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3_l23.log     178.4 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log    276.1 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4_l23.log     177.8 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log    259.3 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t5_l23.log     178.5 q/sec

SSDs does not equal RAMDirectory in speed for this setup, but 81% is not
bad, especially not when compared to the 28% for conventional harddisks.


Performing the same tests with 8GB of available RAM on the machine gave
the following results:

metis_MTRONSSD_RAID0_8GB_i14_v23_t1_l23.log     431.9 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t2_l23.log     594.3 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t2u_l23.log    807.7 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t3_l23.log     472.3 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t3u_l23.log    817.6 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t4_l23.log     464.4 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log    828.8 q/sec
metis_MTRONSSD_RAID0_8GB_i14_v23_t5_l23.log     471.2 q/sec

metis_15000RPM_RAID1_8GB_i14_v23_t1_l23.log     199.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t2_l23.log     220.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t2u_l23.log    312.4 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t3_l23.log     203.8 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t3u_l23.log    370.9 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4_l23.log     203.1 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log    408.1 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t5_l23.log     202.5 q/sec

Switching to 12GB...

metis_MTRONSSD_RAID0_12GB_i14_v23_t1_l23.log    438.8 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t2_l23.log    587.8 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t2u_l23.log   819.9 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t3_l23.log    476.4 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log   833.7 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t4_l23.log    465.4 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log   835.2 q/sec
metis_MTRONSSD_RAID0_12GB_i14_v23_t5_l23.log    467.1 q/sec

metis_15000RPM_RAID1_12GB_i14_v23_t1_l23.log    198.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t2_l23.log    219.1 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t2u_l23.log   309.4 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t3_l23.log    204.1 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t3u_l23.log   362.4 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4_l23.log    202.3 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log   406.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t5_l23.log    201.2 q/sec


Extracting the fastest configurations for the different RAM amounts:

metis_RAM_24GB_i14_v23_t3u_l23.log      996.1 q/sec

3GB of RAM:
metis_MTRONSSD_RAID0_3GB_i14_v23_t4u_l23.log    809.0 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t3u_l23.log    276.1 q/sec

8GB of RAM:
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log    828.8 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log    408.1 q/sec

12GB of RAM:
metis_MTRONSSD_RAID0_12GB_i14_v23_t4u_l23.log   835.2 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log   406.6 q/sec

As can be seen, the SSDs benefit somewhat from running at 8GB, while the
harddrives benefit a lot. Plotting a graph with queries/second over time
shows clearly that the performance of the harddrives relative to the RAM
speed is steadily climbing, while the SSD speed is not (or at least very
little). This tells me that the speed of SSD-stored indexes is fairly
independent of the amount of RAM available for cache.

Upping the amount to 12GB doesn't change much. Clearly 8GB is "enough"
for our 14GB index with our queries.


With the fear of making all this unclear, let's try and ignore the first
5000 queries and cut off the statistics after 50,000 queries. This
mimics a setting with warm-up and a not-so-stale index that gets
replaced once in a while. Extracting the fastest configurations for the
different RAM amounts gives us:

RAMDirectory (24GB of RAM):
metis_RAM_24GB_i14_v23_t2u_l23.log 867.3 q/sec

3GB of RAM:
metis_MTRONSSD_RAID0_3GB_i14_v23_t3u_l23.log 663.2 q/sec
metis_15000RPM_RAID1_3GB_i14_v23_t4u_l23.log 163.4 q/sec

8GB of RAM:
metis_MTRONSSD_RAID0_8GB_i14_v23_t4u_l23.log 653.6 q/sec
metis_15000RPM_RAID1_8GB_i14_v23_t4u_l23.log 163.4 q/sec

12GB of RAM:
metis_MTRONSSD_RAID0_12GB_i14_v23_t3u_l23.log 653.6 q/sec
metis_15000RPM_RAID1_12GB_i14_v23_t4u_l23.log 163.4 q/sec

Yes, the 3*163.4 is a funny coincidence, I double-checked and looked at
the graphs: Up till about 60,000 queries, the graphs are virtually
identical for the 15000RPM, then the one for 3GB RAM stabilizes and the
ones for 8 and 12GB continue being virtually identical and climbing.
For SSDs, the graph for 3GB is a bit higher than the other ones until
about 50-60.000 queries, then a bit lower for the rest.

For this scenario, the speed of SSDs compared to RAMDirectory drops to
75-76% while the speed of harddisks drops to 19%, fairly independent of
RAM. In other words: Upping the amount of RAM does not help us when the
index is replaced before we pass the 50.000 queries.

Another observation: The faster we change our index, the better SSD
looks compared to harddisks. On the flip side - for long run-times with
unchanged index, harddisks seems the better choice, at least from an
economically point of view.


Grand conclusion? Getting 3/4 of the performance of RAMDirectory by
using SSDs on a machine with much less RAM seems like a good deal if
high performance / machine is needed.


Remember, this is all searches with an optimized index. This is on the
corpus from the Danish State and University Library and should be seen
as nothing else than inspiration.

Still pending is experiments with updating large indexes on SSDs. My
guess is that there won't be anywhere near the same speed-increase as
for the pure searches. It'll have to wait a bit though, as it requires
Real Work, as opposed to just starting a script.


NB: I'd like to post my findings on the Lucene wiki, but I have been
unable to locate the appropriate page. Could someone please point me in
the right direction?


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org





---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message