cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Row or Supercolumn with approximately n columns
Date Mon, 02 Jan 2012 22:59:35 GMT
During compaction, both automatic / minor and manual / major. 

The performance drop is having a lot of expired columns that have not been purged by compaction
as they must be read and discarded during reads. 


Aaron Morton
Freelance Developer

On 3/01/2012, at 10:38 AM, R. Verlangen wrote:

> @Aaron: Small side question, when do columns with a past TTL get removed? On a repair,
(minor) compaction, or .. ? Does it have a performance drop if that's happening?
> 2012/1/2 aaron morton <>
> Even if you had compaction enforcing a limit on the number of columns in a row, there
would still be issues with concurrent writes at the same time and with read-repair. i.e. node
a says the this is the first n columns but node b says something else, you only know who is
correct at read time.
> Have you considered using a TTL on the columns ? 
> Depending on the use case you could also consider have writes periodically or randomly
trim the data size, or trim on reads. 
> It will also make sense to partition the time series data into different rows, and Viva
la Standard Column Families!
> Hope that helps. 
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> On 25/12/2011, at 7:48 PM, Praveen Baratam wrote:
>> Hello Everybody,
>> Happy Christmas.
>> I know that this topic has come up quiet a few times on Dev and User lists but did
not culminate into a solution.
>> The above discussion on User list talks about AbstractCompactionStrategy but I could
not find any relevant documentation as its a fairly new feature in Cassandra.
>> Let me state this necessity and use-case again.
>> I need a ColumnFamily (CF) wide or SuperColumn (SC) wide option to approximately
limit the number of columns to "n". "n" can vary a lot and the intention is to throw away
stale data and not to maintain any hard limit on the CF or SC. Its very useful for storing
time-series data where stale data is not necessary. The goal is to achieve this with minimum
overhead and since compaction happens all the time it would be clever to implement it as part
of compaction.
>> Thanks in advance.
>> Praveen

View raw message