lucene-solr-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toke Eskildsen ...@statsbiblioteket.dk>
Subject Re: Hardware Specs Question
Date Fri, 03 Sep 2010 09:39:58 GMT
On Fri, 2010-09-03 at 03:45 +0200, Shawn Heisey wrote:
> On 9/2/2010 2:54 AM, Toke Eskildsen wrote:
> > We've done a fair amount of experimentation in this area (1997-era SSDs
> > vs. two 15.000 RPM harddisks in RAID 1 vs. two 10.000 RPM harddisks in
> > RAID 0). The harddisk setups never stood a chance for searching. With
> > current SSD's being faster than harddisks for writes too, they'll also
> > be better for index building, although not as impressive as for
> > searches. Old notes at http://wiki.statsbiblioteket.dk/summa/Hardware
> 
> How does it compare to six SATA drives in a Dell hardware RAID10?  

I'll have to extrapolate a lot here (also known as guessing).

You don't mention what kind of harddrives you're using, so let's say
15.000 RPM to err on the high-end side. Compared to the 2 drives @
15.000 RPM in RAID 1 we've experimented with, the difference is that the
striping allows for concurrency when the different reads are on
different physical drives (sorry if this is basic, I'm just trying to
establish a common understanding here).

The chance for 2 concurrent reads to be on different drives with 3
harddrives is 5/6, the chance for 3 concurrent reads is 1/6 and the
chance for 3 concurrent reads to be on at least 2 drives is 5/6. For the
sake of argument, let's say that the 3 * striping gives us double the
concurrency I/O.

Taking my old measurements at face value and doubling the numbers for
the 15.000 RPM measurements, this would bring six 15.000 RPM SATA 10
drives up to a throughput that is 1/3 - 2/3 of the SSD, depending on how
we measure.


Some general observations:

With long runtimes, the throughput for harddisk rises relative to the
SSD as the disk cache gets warmed. If there is frequent index updates
with deletions, the SSD gains more ground as it is not nearly as
dependent on disk cache as harddisks.

With small indexes, the difference between harddisks and SSD is
relatively small as the disk cache quickly gets filled. Consequently the
difference increases for large indexes.


One point to note for RAID is that they do not improve the speed of
single searches on a single index: They do not lower the seek time for a
single small I/O request and searching on a single index is done with a
number of small successive requests. If the performance problem is long
search time, RAID does not help (but in combination with sharding or
similar it will). If the problem is the number of concurrent searches,
RAID helps.


Mime
View raw message