cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Omer van der Horst Jansen <ome...@gmail.com>
Subject Mitigating CASSANDRA-2059 -- leftover files
Date Thu, 03 Feb 2011 15:45:50 GMT
Jonathan pointed out in another thread that it looks like I'm running
into CASSANDRA-2059, where secondary files are not being properly
deleted. My production data set at any given time is less than 100 MB
in size, but the Cassandra data directories on each instance are using
30 to 40 times as much space right now, and steadily growing.

I understand I can remove the root cause of the problem by applying
the patch that's attached to the bug report or by upgrading to  0.7.1
when it's out.

In the meantime, is it safe to manually delete stale files while
Cassandra is running?  And how do I determine when a set of files is
stale?

I'd assume that a given set of files is deletable if there is no
-Data.db file and the -Compacted file has zero length.

Example of what I would think is a set of stale files, without a -Data,db file:

ls -l *3090*
-rw-rw-r-- 1 user group    0 Feb  3 10:00 Payload-e-3090-Compacted
-rw-rw-r-- 1 user group  245 Feb  3 10:00 Payload-e-3090-Filter.db
-rw-rw-r-- 1 user group 4362 Feb  3 10:00 Payload-e-3090-Index.db
-rw-rw-r-- 1 user group 4840 Feb  3 10:00 Payload-e-3090-Statistics.db

I've got these all the way back to  Payload-e-1-Index.db.

Non-stale files:
ls -l *3095*
-rw-rw-r-- 1 user group        0 Feb  3 10:35 Payload-e-3095-Compacted
-rw-rw-r-- 1 user group 41269735 Feb  3 10:14 Payload-e-3095-Data.db
-rw-rw-r-- 1 user group   286405 Feb  3 10:14 Payload-e-3095-Filter.db
-rw-rw-r-- 1 user group  7608022 Feb  3 10:14 Payload-e-3095-Index.db
-rw-rw-r-- 1 user group     4840 Feb  3 10:14 Payload-e-3095-Statistics.db

There is an active Data.db file, so I'd leave this group alone.

--Omer

Mime
View raw message