cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Hänel <rol...@haenel.me>
Subject Re: Can Cassandra make real use of several DataFileDirectories?
Date Mon, 26 Apr 2010 19:25:02 GMT
Ryan, I agree with you on the hot spots, however for the physical disk
performance, even the worst case hot spot is not worse than RAID0: in a hot
spot scenario, it might be that 90% of your reads go to one hard drive. But
with RAID0, 100% of your reads will go to *all* hard drives.

But you're right, individual disks might waste up to 50% of your total disk
space...

I came to consider this idea because Hadoop DFS explicitely recommends
different disks. But the design is not exactly the same, they don't have to
deal with very big files on the native FS layer.

-Roland



2010/4/26 Ryan King <ryan@twitter.com>

> 2010/4/26 Roland Hänel <roland@haenel.me>:
> > Hm... I understand that RAID0 would help to create a bigger pool for
> > compactions. However, it might impact read performance: if I have several
> > CF's (with their SSTables), random read requests for the CF files that
> are
> > on separate disks will behave nicely - however if it's RAID0 then a
> random
> > read on any file will create a random read on all of the hard disks.
> > Correct?
>
> Without RAID0 you will end up with host spots (a compaction could end
> up putting a large SSTable on one disk, while the others have smaller
> SSTables). If you have many CFs this might average out, but it might
> not and there are no guarantees here. I'd reccomend RAID0 unless you
> have reason to do something else.
>
> -ryan
>

Mime
View raw message