incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Shook <>
Subject Re: SSD vs. HDD
Date Wed, 03 Nov 2010 23:24:27 GMT
Ah. Point taken on the random access SSD performance. I was trying to
emphasize the relative failure rates given the two scenarios. I didn't
mean to imply that SSD random access performance was not a likely
improvement here, just that it was a complicated trade-off in the
grand scheme of things.. Thanks for catching my goof.

On Wed, Nov 3, 2010 at 3:58 PM, Tyler Hobbs <> wrote:
> SSD will not generally improve your write performance very much, but they
> can significantly improve read performance.
> You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD
> can write sequentially very quickly.  For the data drive, they might make
> sense.
> As Jonathan talks about, it has a lot to do with your access patterns.  If
> you either: (1) delete parts of rows (2) update parts of rows, or (3) insert
> new columns into existing rows frequently, you'll end up with rows spread
> across several SSTables (which are on disk).  This means that each read may
> require several seeks, which are very slow for HDDs, but are very quick for
> SSDs.
> Of course, the randomness of what rows you access is also important, but
> Jonathan did a good job of covering that.  Don't forget about the effects of
> caching here, too.
> The only way to tell if it is cost-effective is to test your particular
> access patterns (using a configured test or, preferably, your
> actual application).
> - Tyler
> On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook <> wrote:
>> SSDs are not reliable after a (relatively-low compared to spinning
>> disk) number of writes.
>> They may significantly boost performance if used on the "journal"
>> storage, but will suffer short lifetimes for highly-random write
>> patterns.
>> In general, plan to replace them frequently. Whether they are worth
>> it, given the performance improvement over the cost of replacement x
>> hardware x logistics is generally a calculus problem. It's difficult
>> to make a generic rationale for or against them.
>> You might be better off in general by throwing more memory at your
>> servers, and isolating your random access from your journaled data.
>> Is there any pattern to your reads and writes/deletes? If it is fully
>> random across your keys, then you have the worst-case scenario.
>> Sometimes you can impose access patterns or structural patterns in
>> your app which make caching more effective.
>> Good questions to ask about your data access:
>> Is there a "user session" which shows an access pattern to proximal data?
>> Are there sets of access which always happen close together?
>> Are there keys or maps which add extra indirection?
>> I'm not familiar with your situation. I was just providing some general
>> ideas..
>> Jonathan Shook
>> On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi <> wrote:
>> > Hi,
>> > we have a continuous high throughput writes, read and delete, and we are
>> > trying to find the best hardware.
>> > Is using SSD for Cassandra improves performance? Did any one compare SSD
>> > vs.
>> > HDD? and any recommendations on SSDs?
>> >
>> > Thanks,
>> > Alaa
>> >
>> >

View raw message