cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <jbel...@gmail.com>
Subject Re: Multiple Data Directories
Date Fri, 26 Feb 2010 04:42:54 GMT
On Thu, Feb 25, 2010 at 3:54 PM, Anthony Molinaro
<anthonym@alumni.caltech.edu> wrote:
> What about the case where cpu and ram are underutilized, and your bottleneck
> is disk io (which seems to often be the case in ec2), then adding more
> spindles improves overall throughput of the system.  I've actually tested
> this when adding an additional ebs, and hand moving files around, then
> restarting.  Suddenly node's performance (measued via cfstats metrics),
> get better

That sounds like you're actually ram-limited, so adding nodes will be
better than adding EBS devices.

> How do
> the files ever get that big, does a repair fully compact (ie, down
> to one file)?  I guess the question is how do you end up with the
> "worst" case?

Any major compaction will do that.  Repair will invoke one, or it can
happen "naturally" too.

> I guess Raid0 is the only way to use multiple disks efficiently and
> the multiple DataFileDirectories is really not very useful?  I'm
> trying to think of a good reason you might want multiple data directories
> and all I can't think of one now, is there a good reason?

If you are throughput-bound instead of size-bound (which is the case
for most uses, especially on non-virtual hardware), then I would
expect better performance from JBOD.

-Jonathan

Mime
View raw message