cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Durity, Sean R" <SEAN_R_DUR...@homedepot.com>
Subject RE: Massive deletes -> major compaction?
Date Thu, 21 Sep 2017 20:44:33 GMT
So, let me make sure my assumptions are correct (and let others learn as well):


-          A major compaction would read all sstables at once (ignoring the max_threshold),
thus the potential for needing double the disk space (of course if it wrote 30% less, it wouldn’t
be double…)

-          Major compaction would leave one massive sstable, that wouldn’t get automatically
compacted for a long time

-          A user-defined compaction on 1 sstable would not evict any tombstoned data that
is in any other sstable (like a newer one with the deletes…). It would only remove data
if the tombstone is already in the same sstable.


Sean Durity

From: Jeff Jirsa [mailto:jjirsa@gmail.com]
Sent: Thursday, September 21, 2017 2:51 PM
To: user@cassandra.apache.org
Subject: Re: Massive deletes -> major compaction?

The major compaction is most efficient but can temporarily double (nearly) disk usage - if
you can afford that, go for it.

Alternatively you can do a user-defined compaction on each sstable in reverse generational
order (oldest first) and as long as the data is minimally overlapping it’ll purge tombstones
that way as well - takes longer but much less disk involved.


--
Jeff Jirsa


On Sep 21, 2017, at 11:27 AM, Durity, Sean R <SEAN_R_DURITY@homedepot.com<mailto:SEAN_R_DURITY@homedepot.com>>
wrote:
Cassandra version 2.0.17 (yes, it’s old – waiting for new hardware/new OS to upgrade)

In a long-running system with billions of rows, TTL was not set. So a one-time purge is being
planned to reduce disk usage. Records older than a certain date will be deleted. The table
uses size-tiered compaction. Deletes are probably 25-40% of the complete data set. To actually
recover the disk space, would you recommend a major compaction after the gc_grace_seconds
time? I expect compaction would then need to be scheduled regularly (ick)…

We also plan to re-insert the remaining data with a calculated TTL, which could also benefit
from compaction.


Sean Durity

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.

________________________________

The information in this Internet Email is confidential and may be legally privileged. It is
intended solely for the addressee. Access to this Email by anyone else is unauthorized. If
you are not the intended recipient, any disclosure, copying, distribution or any action taken
or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed
to our clients any opinions or advice contained in this Email are subject to the terms and
conditions expressed in any applicable governing The Home Depot terms of business or client
engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy
and content of this attachment and for any damages or losses arising from any inaccuracies,
errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature,
which may be contained in this attachment and shall not be liable for direct, indirect, consequential
or special damages in connection with this e-mail message or its attachment.
Mime
View raw message