incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Effective allocation of multiple disks
Date Thu, 11 Mar 2010 20:32:39 GMT
I'm still wondering what happens when you have something like 2 500GB disks,
with 2 sstables which use up 25OGB, one on each disk, then a major compaction
occurs.  Will it still compact and probably fill up a disk (especially with
the 2x overhead of compaction mentioned either here or on the wiki)?

Seems like you basically could easily get into a situation where you can't
fix it without something like a volume manager, or a complete shutdown, move
data to bigger disk upgrade.

I guess one way might be to treat each disk as a separate node (ie, give
it some fraction of the keyspace based on its disk space), then when you
add a directory to the config you would have to load balance but only
within that node.  I'm sure that complicates ring maintenance but maybe
its a better experience, as the multiple data directories should all fill
uniformly?

Just some other thoughts.

-Anthony

On Thu, Mar 11, 2010 at 12:45:14PM -0600, Jonathan Ellis wrote:
> Except that for a major compaction the whole thing gets put in one
> directory.  That's the problem w/ the JBOD approach.
> 
> On Thu, Mar 11, 2010 at 12:01 PM, Eric Evans <eevans@rackspace.com> wrote:
> > On Wed, 2010-03-10 at 23:20 -0600, Jonathan Ellis wrote:
> >> On Wed, Mar 10, 2010 at 9:31 PM, Anthony Molinaro
> >> <anthonym@alumni.caltech.edu> wrote:
> >> > I would almost
> >> > recommend just keeping things simple and removing multiple data
> >> directories
> >> > from the config altogether and just documenting that you should plan
> >> on using
> >> > OS level mechanisms for growing diskspace and io.
> >>
> >> I think that is a pretty sane suggestion actually.
> >
> > Or maybe leave the code as is and just document the situation more
> > clearly? If you're adding more disks to increase storage capacity and
> > you don't strictly need the extra IO, then multiple data directories
> > might be preferable to other forms of aggregation (it's certainly
> > simpler than say a volume manager).
> >
> > --
> > Eric Evans
> > eevans@rackspace.com
> >
> >

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message