incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ahmy Yulrizka <>
Subject Data Model for time series with multiple interval / sub-sample
Date Thu, 24 Oct 2013 07:30:39 GMT

I'm very new with and trying out cassandra. I have couple of question
regarding the design of the database.

We have an API to store time series sensor data in millisecond precision.
user can do CRUD operation through the Restful API. When user retrieve
data, by default they can specify `start_date` and `end_date` which is an
epoch time stamp.

every GET request are paginated with maximum 1000 item per-page. Also user
can specify interval of data in one of (604800 (1 week), 86400 (1 day),
3600 (1 hr), 1800 (30 min), 600 (10 min), 300 ( 5 min), and 60 (1 min))

My initial design is
1. a table for stroing row data
2. table for each sensor interval
3. sensor as row
4. timestamp as column

but the current problem is about deletion of data. let say that i have
store 120 data point, 1 point every second for 2 minute. the interval is
populated with the last data point received on that interval.
this mean:

120 column on raw table
2 column on of the '1 min' interval table
1 column on other interval table.

let say that I delete one data point, this mean that I have to get all
interval data where the point belongs to and also get raw data around the
deleted point to either update or remove the data on the interval table.

also we support delete data with time range, then this will be more complex
operation probably.

Is this design correct or maybe there is a better design for modeling the
data ?

Ahmy Yulrizka

View raw message