This solution is very "sql-like", meaning that you query what you want when you need it.
Unfortunately this will probably not scale as your data grows, you might want to consider de-normalizing your data. You could maintain a min/max average in the application that inserts the data, or have a batch that runs periodically to precompute the data.

On Thu, Sep 3, 2015 at 6:59 AM, Kevin Burton <burton@spinn3r.com> wrote:

Check out kairosd for a time series db on Cassandra.

On Aug 31, 2015 7:12 AM, "Peter Lin" <woolfel@gmail.com> wrote:

I didn't realize they had added max and min as stock functions.

to get the sample time. you'll probably need to write a custom function. google for it and you'll find people that have done it.

On Mon, Aug 31, 2015 at 10:09 AM, Pål Andreassen <pal.andreassen@bouvet.no> wrote:

Cassandra 2.2 has min and max built-in. My problem is getting the corresponding sample time as well.

 

Pål Andreassen

54°23'58"S 3°18'53"E

Konsulent

Mobil +47 982 85 504

pal.andreassen@bouvet.no

 

Bouvet Norge AS
Avdeling Grenland

Uniongata 18, Klosterøya

N-3732 Skien

Tlf +47 23 40 60 00 

bouvet.no

 

From: Peter Lin [mailto:woolfel@gmail.com]
Sent: mandag 31. august 2015 16.09
To: user@cassandra.apache.org
Subject: Re: Cassandra 2.2 for time series

 

 

Unlike SQL, CQL doesn't have built-in functions like max/min

In the past, people would create summary tables to keep rolling stats for reports/analytics. In cql3, there's user defined functions, so you can write a function to do max/min

http://cassandra.apache.org/doc/cql3/CQL-2.2.html#selectStmt
http://cassandra.apache.org/doc/cql3/CQL-2.2.html#udfs

 

On Mon, Aug 31, 2015 at 9:48 AM, Pål Andreassen <pal.andreassen@bouvet.no> wrote:

Hi

 

I’m currently evaluating Cassandra as a potiantial database for storing time series data from lots of devices (IoT type of scenario).

Currently we have a few thousand devices with X channels (measurements) that they report at different intervals (from 5 minutes and up).

 

I’ve created as simple test table to store the data:

 

CREATE TABLE DataRaw(

  channelId int,

  sampleTime timestamp,

  value double,

  PRIMARY KEY (channelId, sampleTime)

) WITH CLUSTERING ORDER BY (sampleTime ASC);

 

This schema seems to work ok, but I have queries that I need to support that I cannot easily figure out how to perform (except getting all the data out and iterate it myself).

 

Query 1: For max and min queries, I not only want the maximum/minimum value, but also the corresponding timestamp.

 

sampleTime          value
2015-08-28 00:00    10
2015-08-28 01:00    15
2015-08-28 02:00    13


I'd like the max query to return both 2015-08-28 01:00 and 15. SELECT sampleTime, max(value) FROM DataRAW return the max value, but the first sampleTime.

Also I wonder if Cassandra has built-in support for interpolation/extrapolation. Some sort of group by hour/day/week/month and even year function.

 

Query 2: Give me hourly averages for channel X for yesterday. I’d expect to get 24 values each of which is the hourly average. Or give my daily averages for last year for a given channel. Should return 365 daily averages.

 

Best regards

 

Pål Andreassen

54°23'58"S 3°18'53"E

Konsulent

Mobil +47 982 85 504

pal.andreassen@bouvet.no

 

Bouvet Norge AS
Avdeling Grenland

Uniongata 18, Klosterøya

N-3732 Skien

Tlf +47 23 40 60 00 

bouvet.no