incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boris Yen <yulin...@gmail.com>
Subject Major compaction does not seems to free the disk space a lot if wide rows are used.
Date Thu, 16 May 2013 08:07:13 GMT
Hi All,

Sorry for the wide distribution.

Our cassandra is running on 1.0.10. Recently, we are facing a weird
situation. We have a column family containing wide rows (each row might
have a few million of columns). We delete the columns on a daily basis and
we also run major compaction on it everyday to free up disk space (the
gc_grace is set to 600 seconds).

However, every time we run the major compaction, only 1 or 2GB disk space
is freed. We tried to delete most of the data before running compaction,
however, the result is pretty much the same.

So, we tried to check the source code. It seems that the column tombstones
could only be purged when the row key is not in other sstables. I know the
major compaction should include all sstables, however, in our use case,
columns get inserted rapidly. This will make the cassandra flush the
memtables to disk and create new sstables. The newly created sstables will
have the same keys as the sstables that are being compacted (the compaction
will take 2 or 3 hours to finish). My question is that will these newly
created sstables be the cause of why most of the column-tombstone not being
purged?

p.s. We also did some other tests. We inserted data to the same CF with the
same wide-row pattern and deleted most of the data. This time we stopped
all the writes to cassandra and did the compaction. The disk usage
decreased dramatically.

Any suggestions or is this a know issue.

Thanks and Regards,
Boris

Mime
View raw message