cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aaron Turner <>
Subject Re: Data modeling advice (time series)
Date Tue, 01 May 2012 18:00:49 GMT
On Tue, May 1, 2012 at 10:20 AM, Tim Wintle <> wrote:
> I believe that the general design for time-series schemas looks
> something like this (correct me if I'm wrong):
> (storing time series for X dimensions for Y different users)
> But I've not found much advice on calculating optimal bucket sizes (i.e.
> optimal number of columns per row), and how that decision might be
> affected by compression (or how significant the performance differences
> between the two options might be).
> Are the calculations here are still considered valid (proportionally) in
> 1.X, with the changes to SSTables, or is it significantly different?
> <>

Tens or a few hundred MB per row seems reasonable.  You could do
thousands/MB if you wanted to, but that can make things harder to

Depending on the size of your data, you may find that the overhead of
each column becomes significant; far more then the per-row overhead.
Since all of my data is just 64bit integers, I ended up taking a days
worth of values (288/day @ 5min intervals) and storing it as a single
column as a vector.  Hence I have two CF's:

StatsDaily  -- each row == 1 day, each column = 1 stat @ 5min intervals
StatsDailyVector -- each row == 1 year, each column = 288 stats @ 1
day intervals

Every night a job kicks off and converts each row's worth of
StatsDaily into a column in StatsDailyVector.  By doing it 1:1 this
way, I also reduce the number of tombstones I need to write in
StatsDaily since I only need one tombstone for the row delete, rather
then 288 for each column deleted.

I don't use compression.

Aaron Turner         Twitter: @synfinatic - Pcap editing and replay tools for Unix & Windows
Those who would give up essential Liberty, to purchase a little temporary
Safety, deserve neither Liberty nor Safety.
    -- Benjamin Franklin
"carpe diem quam minimum credula postero"

View raw message