incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alaa Zubaidi <alaa.zuba...@pdf.com>
Subject Re: SSD vs. HDD
Date Wed, 03 Nov 2010 23:10:49 GMT
Thanks for the reply.
I am having time out errors while reading.
I have 5 CFs but two CFs with high write/read.
The data is organized in time series rows, in CF1 the new rows are read 
every 10 seconds and then the whole rows are deleted, While in CF2 the 
rows are read in different time range slices and eventually deleted may 
be after few hours.

Thanks

On 11/3/2010 1:58 PM, Tyler Hobbs wrote:
> SSD will not generally improve your write performance very much, but they
> can significantly improve read performance.
>
> You do *not* want to waste an SSD on the commitlog drive, as even a slow HDD
> can write sequentially very quickly.  For the data drive, they might make
> sense.
>
> As Jonathan talks about, it has a lot to do with your access patterns.  If
> you either: (1) delete parts of rows (2) update parts of rows, or (3) insert
> new columns into existing rows frequently, you'll end up with rows spread
> across several SSTables (which are on disk).  This means that each read may
> require several seeks, which are very slow for HDDs, but are very quick for
> SSDs.
>
> Of course, the randomness of what rows you access is also important, but
> Jonathan did a good job of covering that.  Don't forget about the effects of
> caching here, too.
>
> The only way to tell if it is cost-effective is to test your particular
> access patterns (using a configured stress.py test or, preferably, your
> actual application).
>
> - Tyler
>
> On Wed, Nov 3, 2010 at 3:44 PM, Jonathan Shook<jshook@gmail.com>  wrote:
>
>> SSDs are not reliable after a (relatively-low compared to spinning
>> disk) number of writes.
>> They may significantly boost performance if used on the "journal"
>> storage, but will suffer short lifetimes for highly-random write
>> patterns.
>>
>> In general, plan to replace them frequently. Whether they are worth
>> it, given the performance improvement over the cost of replacement x
>> hardware x logistics is generally a calculus problem. It's difficult
>> to make a generic rationale for or against them.
>>
>> You might be better off in general by throwing more memory at your
>> servers, and isolating your random access from your journaled data.
>> Is there any pattern to your reads and writes/deletes? If it is fully
>> random across your keys, then you have the worst-case scenario.
>> Sometimes you can impose access patterns or structural patterns in
>> your app which make caching more effective.
>>
>> Good questions to ask about your data access:
>> Is there a "user session" which shows an access pattern to proximal data?
>> Are there sets of access which always happen close together?
>> Are there keys or maps which add extra indirection?
>>
>> I'm not familiar with your situation. I was just providing some general
>> ideas..
>>
>> Jonathan Shook
>>
>> On Wed, Nov 3, 2010 at 2:32 PM, Alaa Zubaidi<alaa.zubaidi@pdf.com>  wrote:
>>> Hi,
>>> we have a continuous high throughput writes, read and delete, and we are
>>> trying to find the best hardware.
>>> Is using SSD for Cassandra improves performance? Did any one compare SSD
>> vs.
>>> HDD? and any recommendations on SSDs?
>>>
>>> Thanks,
>>> Alaa
>>>
>>>

-- 
Alaa Zubaidi
PDF Solutions, Inc.
333 West San Carlos Street, Suite 700
San Jose, CA 95110  USA
Tel: 408-283-5639 (or 408-280-7900 x5639)
fax: 408-938-6479
email: alaa.zubaidi@pdf.com



Mime
View raw message