incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nish garg <>
Subject Re: OOM while performing major compaction
Date Thu, 27 Feb 2014 23:03:40 GMT
Hello Tupshin,

Yes all the data needs to be kept for just last 6 hours. Yes changing to
new CF every 6 hours solves the compaction issue, but between the change we
will have less than 6 hours of data. We can use CF1 and CF2 and truncate
them one at a time every 6 hours in loop but we need some kind of view that
does (CF1 union CF2) to get final data. Unfortunately views are not
supported in Cassandra. May be we can change our code to see 2 CFs all the is kind of hack and does not seem to be perfect solution.

On Thu, Feb 27, 2014 at 4:49 PM, Tupshin Harper <> wrote:

> If you can programmatically roll over onto a new column family every 6
> hours (or every day or other reasonable increment), and then just drop your
> existing column family after all the columns would have been expired, you
> could skip your compaction entirely. It was not clear to me from your
> description whether *all* of the data only needs to be retained for 6
> hours. If that is true, rolling over to a new cf will be your simplest
> option.
> -Tupshin
> On Thu, Feb 27, 2014 at 5:31 PM, Nish garg <> wrote:
>> Thanks for replying.
>> We are  on Cassandra 1.2.9.
>> We have time series like data structure where we need to keep only last 6
>> hours of data. So we expire data using  expireddatetime column on column
>> family and then we run expire script via cron to create tombstones. We
>> don't use ttl yet and planning to use it in our future release. Hope that
>> will fix some of the issues caused by expire script as it needs to read the
>> data first before creating tombstones.
>> So to answer your question, we have almost 80% of tombstones in those
>> sstables. (There is no easy way to confirm this unless I convert all those
>> 33000 sstables to JSON file and query them for tombstones).
>> The reason of 33000 of them may be due to machine load too high for minor
>> compaction and it was falling behind or some thing happened to minor
>> compaction thread on this node. Other two nodes in this cluster are fine.
>> Yes, we are using sized compaction strategy.
>> I am inclined towards 'decommission and bootstrap' this node as it seems
>> like performing major compaction on this node is impossible.
>> However still looking for other solutions...
>> On Thu, Feb 27, 2014 at 4:03 PM, Robert Coli <>wrote:
>>> On Thu, Feb 27, 2014 at 11:09 AM, Nish garg <> wrote:
>>>> I am having OOM during major compaction on one of the column family
>>>> where there are lot of SStables (33000) to be compacted. Is there any other
>>>> way for them to be compacted? Any help will be really appreciated.
>>> You can use user defined compaction to reduce the working set, but only
>>> a major compaction is capable of purging 100% of tombstones.
>>> How much garbage is actually in the files? Why do you have 33,000 of
>>> them? You mention a major compaction so you are likely not using LCS with
>>> the bad 5mb default... how did you end up with so many SSTables?
>>> Have you removed the throttle from compaction, generally?
>>> What version of Cassandra?
>>> =Rob

View raw message