cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony Molinaro <antho...@alumni.caltech.edu>
Subject Re: Recovery from botched compaction
Date Tue, 13 Apr 2010 20:59:31 GMT

On Tue, Apr 13, 2010 at 10:54:51AM -0500, Jonathan Ellis wrote:
> On Sat, Apr 10, 2010 at 2:24 PM, Anthony Molinaro
> <anthonym@alumni.caltech.edu> wrote:
> >  This is sort of a pre-emptive question as the compaction I'm doing hasn't
> > failed yet but I expect it to any time now.  I have a cluster which has been
> > storing user profile data for a client.  Recently I've had to go back and
> > reload all the data again.  I wasn't watching diskspace, and on one of the
> > nodes it went above 50% (which I recall was bad), to somewhere around 70%.
> > I expect to most back with a compaction (as most of the data was the same
> > so a compaction should remove old copies), and went ahead and started one
> > with nodeprobe compact (using 0.5.0 on this cluster).  However, I do see
> > that the disk usage is growing (it's at 91% now).
> 
> Right, it can't remove any old data, until the compacted version is written.
> 
> (This is where the 50% recommendation comes from: worst-case, the
> compacted version will take up exactly as much space as it did before,
> if there were no deletes or overwrites.)

I actually got lucky and while it hovered in the 91-95% full, compaction
finished and its now at 60%.  However, I still have around a dozen or so
data files.  I thought 'nodeprobe compact' did a major compaction, and
that a major compaction would shrink to one file?

> > So when the disk fills up and this compaction crashes what can I do?
> > I assume get a bigger disk, shut down the node, move the data and
> > restart will work, but do I have other options?
> > Which files can I ignore (ie, can I not move any of the *-tmp-* files)?
> > Will my system be in a corrupt state?
> 
> It won't corrupt itself, and it will automatically r/m tmp files when
> it starts up.
> 
> If the disk fills up entirely then the node will become unresponsive
> even for reads which is something we plan to fix.
> (https://issues.apache.org/jira/browse/CASSANDRA-809)
> 
> Otherwise there isn't a whole lot you can do with the "I need to put
> more data on my machine than I have room for" scenario.

Got it, I already went ahead and added a few EBS's, raid0'd them and
transfered data over to them.  I was happy to recall that if I turned
off writes (not hard as writes are all bulk on this cluster), the disk
files never change, so I was able to rsync while serving reads :)

> > This machine is one in a set of 6, and since I didn't choose tokens
> > initially, they are very lopsided (ie, some use 20% of their disk, others
> > 60-70%).  If I were to start moving tokens around would the machines short
> > of space be able to anti-compact without filling up?  or does anti-compaction
> > like compaction require 2x disk space?
> 
> Anticompaction requires as much space as the data being transferred,
> so worst case of transferring 100% off would require 2x.
> 
> https://issues.apache.org/jira/browse/CASSANDRA-579 will fix this for
> the anticompaction case.

Okay, sounds good, I may leave it for the moment, as last time I tried
any sort of move/decommision with 0.5.x I was unable to figure out if
anything was happening, so I may just wait and revisit when I upgrade.

Thanks for the answers,

-Anthony

-- 
------------------------------------------------------------------------
Anthony Molinaro                           <anthonym@alumni.caltech.edu>

Mime
View raw message