incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Wintle <timwin...@gmail.com>
Subject Re: Data modeling advice (time series)
Date Wed, 02 May 2012 15:22:52 GMT
On Tue, 2012-05-01 at 11:00 -0700, Aaron Turner wrote:
> Tens or a few hundred MB per row seems reasonable.  You could do
> thousands/MB if you wanted to, but that can make things harder to
> manage.

thanks (Both Aarons)

> Depending on the size of your data, you may find that the overhead of
> each column becomes significant; far more then the per-row overhead.
> Since all of my data is just 64bit integers, I ended up taking a days
> worth of values (288/day @ 5min intervals) and storing it as a single
> column as a vector.

By "vector" do you mean a raw binary array of long ints?

That sounds very nice for reducing overhead - but I'd like to to work
with counters (I was going to rely on them for streaming "real-time"
updates).

Is that why you've got the two CFs described below (to have an archived
summary and a live version that can have counters), or do you have no
contention over writes/increments for individual values?

>   Hence I have two CF's:
> 
> StatsDaily  -- each row == 1 day, each column = 1 stat @ 5min intervals
> StatsDailyVector -- each row == 1 year, each column = 288 stats @ 1
> day intervals
> 
> Every night a job kicks off and converts each row's worth of
> StatsDaily into a column in StatsDailyVector.  By doing it 1:1 this
> way, I also reduce the number of tombstones I need to write in
> StatsDaily since I only need one tombstone for the row delete, rather
> then 288 for each column deleted.
> 
> I don't use compression.



Mime
View raw message