incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Can Cassandra make real use of several DataFileDirectories?
Date Mon, 26 Apr 2010 21:15:51 GMT
I think it might be worse case that you read all the disks. If your
block size is large enough to hold an entire row, you should only have to
read one disk to get that data.

I for instance, stopped using multiple data directories and instead use
a RAID0.  The number of blocks read is not the same for all the disks
as you suggest it would be if every disk was involved in every transaction.

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda1             11.80         1.60       105.60          8        528
sdb              17.20       867.20         0.00       4336          0
sdc               2.60         0.00       155.20          0        776
sdd              16.40       796.80         0.00       3984          0
sde              21.80      1113.60         8.00       5568         40
md0              56.00      2777.60         8.00      13888         40

sdb, sdd and sdd are raided on md0 on an ec2 xlarge instance, the number
of blockes is different.

Of course my rows are small (1-2 Kb), so I should rarely cross a block
boundary, with 1MB rows you are more likely to, so multiple data directories
might be better for you.

I think it all sort of depends on your data size.

-Anthony

On Mon, Apr 26, 2010 at 10:09:58PM +0200, Roland H?nel wrote:
> RAID0 decreases the performance of muliple, concurrent random reads because
> for each read request (I assume that at least a couple of stripe sizes are
> read), all hard disks are involved in that read.
> 
> Consider the following example: you want to read 1MB out of each of two
> files
> 
> a) both files are on the same RAID0 of two disks. For the first 1MB read
> request, both disks contain some stripes of this request, both disks have to
> move their heads to the correct location and do the read. The second read
> request has to wait until the first one finishes, because it is served from
> the same disks and depends on the same disk heads.
> 
> b) files are on seperate disks. Both reads can be done at the same time,
> because disk heads can move independently.
> 
> Or look at it this way: if you issue a read request on a RAID0, and your
> disks have 8ms access time, then after the read request, the whole RAID0 is
> completely blocked for 8ms. If you handle the disks independently, only the
> disk containing the file is blocked.
> 
> RAID0 has its advantages of course. Streaming reads/writes (e.g. during a
> compaction) will be extremely fast.
> 
> -Roland
> 
> 
> 2010/4/26 Paul Prescod <paul@prescod.net>
> 
> > 2010/4/26 Roland Hänel <roland@haenel.me>:
> > > Ryan, I agree with you on the hot spots, however for the physical disk
> > > performance, even the worst case hot spot is not worse than RAID0: in a
> > hot
> > > spot scenario, it might be that 90% of your reads go to one hard drive.
> > But
> > > with RAID0, 100% of your reads will go to *all* hard drives.
> >
> > RAID0 is designed specifically to improve performance (both latency
> > and bandwidth). I'm unclear about why you think it would decrease
> > importance. Perhaps you're thinking of another RAID type?
> >
> >  Paul Prescod
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message