cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roland Hänel <>
Subject Re: Can Cassandra make real use of several DataFileDirectories?
Date Mon, 26 Apr 2010 20:09:58 GMT
RAID0 decreases the performance of muliple, concurrent random reads because
for each read request (I assume that at least a couple of stripe sizes are
read), all hard disks are involved in that read.

Consider the following example: you want to read 1MB out of each of two

a) both files are on the same RAID0 of two disks. For the first 1MB read
request, both disks contain some stripes of this request, both disks have to
move their heads to the correct location and do the read. The second read
request has to wait until the first one finishes, because it is served from
the same disks and depends on the same disk heads.

b) files are on seperate disks. Both reads can be done at the same time,
because disk heads can move independently.

Or look at it this way: if you issue a read request on a RAID0, and your
disks have 8ms access time, then after the read request, the whole RAID0 is
completely blocked for 8ms. If you handle the disks independently, only the
disk containing the file is blocked.

RAID0 has its advantages of course. Streaming reads/writes (e.g. during a
compaction) will be extremely fast.


2010/4/26 Paul Prescod <>

> 2010/4/26 Roland Hänel <>:
> > Ryan, I agree with you on the hot spots, however for the physical disk
> > performance, even the worst case hot spot is not worse than RAID0: in a
> hot
> > spot scenario, it might be that 90% of your reads go to one hard drive.
> But
> > with RAID0, 100% of your reads will go to *all* hard drives.
> RAID0 is designed specifically to improve performance (both latency
> and bandwidth). I'm unclear about why you think it would decrease
> importance. Perhaps you're thinking of another RAID type?
>  Paul Prescod

View raw message