incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Multiple Data Directories
Date Thu, 25 Feb 2010 21:54:03 GMT
What about the case where cpu and ram are underutilized, and your bottleneck
is disk io (which seems to often be the case in ec2), then adding more
spindles improves overall throughput of the system.  I've actually tested
this when adding an additional ebs, and hand moving files around, then
restarting.  Suddenly node's performance (measued via cfstats metrics),
get better, but eventually you need to run a cleanup (from adding another
node to the cluster to deal with network io bottlenecks), and everything
gets put back to one directory.  The disk bottleneck resurfaces.

So when compaction occurs I don't see it compact down to a single file
and looking through directories I see several Data files for a CF of
various sizes (1.5-3G each).   The wiki seems to suggest that you
eventually can have file sizes in the multiple terabytes.  How do
the files ever get that big, does a repair fully compact (ie, down
to one file)?  I guess the question is how do you end up with the
"worst" case?

I guess Raid0 is the only way to use multiple disks efficiently and
the multiple DataFileDirectories is really not very useful?  I'm
trying to think of a good reason you might want multiple data directories
and all I can't think of one now, is there a good reason?

Anyway, I'll see if I can't try out some sort of Raid0 to see how
it performs.

Thanks,

-Anthony

On Thu, Feb 25, 2010 at 03:07:58PM -0600, Jonathan Ellis wrote:
> In the "worst" case, compaction combines them all into a single file
> anyway.  So I think your approach is flawed.  It's designed to allow
> adding capacity by adding nodes, not just by adding more space, or
> your cpu / ram ratio will degrade.
> 
> On Thu, Feb 25, 2010 at 2:48 PM, Anthony Molinaro
> <anthonym@alumni.caltech.edu> wrote:
> > Okay, so the disk sizing seems to make sense for what I am seeing, the
> > disk which seems to get all the data is the largest.  On the new machines
> > which have 3 disks of equal size, compaction seems to be distributing
> > among the disks.
> >
> > Raid0 would sort of defeat the purpose of being able to add additional
> > capacity on the fly (ie, adding ebs volumes to increase capacity), as
> > I need to know ahead of time what my configuration is.
> >
> > The new boxes ended up with all the data files in one directory because
> > of the bug in 0.5.0 when bootstrapping with multiple directories which
> > I worked around by using one directory, bootstrapping, then adding
> > the other directories.
> >
> > So when I have the situation of I just added an additional directory
> > with space equal to a current directory, is there any way to redistribute
> > the data files?  The operations page makes me think that maybe nodeprobe
> > repair might do it, will it?
> >
> > Thanks,
> >
> > -Anthony
> >
> > On Thu, Feb 25, 2010 at 01:43:22PM -0600, Jonathan Ellis wrote:
> >> Compaction is why http://wiki.apache.org/cassandra/CassandraHardware
> >> recommends raid0-ing if you are concerned about free disk space
> >> limits.
> >>
> >> On Thu, Feb 25, 2010 at 1:36 PM, Gary Dusbabek <gdusbabek@gmail.com> wrote:
> >> > Cassandra always compacts to the directory with the most free space.
> >> > There is not a way to influence this.
> >> >
> >> > Gary
> >> >
> >> > On Thu, Feb 25, 2010 at 13:23, Anthony Molinaro
> >> > <anthonym@alumni.caltech.edu> wrote:
> >> >> Hi,
> >> >>
> >> >>  So is there anyway to force distribution among DataFileDirectory
entries
> >> >> when you add a new one?  Looking at the nodeprobe operations it seems
like
> >> >> repair which causes a major compaction might do it?  I've tried shutting
a
> >> >> node down moving files around by hand and starting up, but the next
> >> >> compaction seems to move everything back to a single directory?
> >> >>
> >> >>  I do see files show up in other directories as they are flushed,
but
> >> >> then they all seem to make their way back to the first directory in
the
> >> >> list.
> >> >>
> >> >> Thanks,
> >> >>
> >> >> -Anthony
> >> >>
> >> >> --
> >> >> ------------------------------------------------------------------------
> >> >> Anthony Molinaro                           <anthonym@alumni.caltech.edu>
> >> >>
> >> >
> >
> > --
> > ------------------------------------------------------------------------
> > Anthony Molinaro                           <anthonym@alumni.caltech.edu>
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message