cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Wenzel <mwen...@proheris.de>
Subject WG: How to sort result-set? / How to proper model a table?
Date Wed, 26 Jul 2017 13:44:37 GMT
Hey everyone,

I'm new to Cassandra, going my first steps, having a problem/question regarding sorting results
and proper data modelling.

First of all, I read the article "We Shall Have Order!" by Aaron Ploetz (1) to get a first
view on how Cassandra works.
I reproduced the example in the article with my own table.

DROP TABLE sensors;
CREATE TABLE sensors (
    timestamp BIGINT,
    name VARCHAR,
    value VARCHAR,
    unit VARCHAR,
    PRIMARY KEY (name, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);

I'm actual running Cassandra on a single node ([cqlsh 5.0.1 | Cassandra 3.11.0 | CQL spec
3.4.4 | Native protocol v4]).
Now some background information about my project:

I want to store all kinds of measuring-data from all kinds sensors. No matter if the sensor
is measuring a temperature, water flow, or whatever. Sensors always give a single value. Interpretation
has to be done afterwards by the user.
So in my example, I 'm measuring temperatures of my house which leads me to the following
data:

timestamp               name              value unit
2017-07-24 14-11-00     entrance-a        20    Celsius
2017-07-24 14-11-04     living-room       24    Celsius
2017-07-24 14-11-07     bath-room         22    Celsius
2017-07-24 14-11-15     bed-room          23    Celsius
2017-07-24 14-11-22     entrance-b        20    Celsius

I'm measuring time-triggered each 15 minutes. In order to have some kind of start and end
for each process, I decided to measure the entrance twice with different named sensors (entrance
a and b). So above is one set of measuring-data, created by a single process.
I'd say this is just another perfect example of what Aaron Ploetz describes in his article.

When I query Cassandra the result set is not sorted by timestamp as long as I won't use the
primary key in my WHERE clause.
When I ask myself: "What will I query Cassandra for?" I'm always coming up with the same typical
thoughts:

*         LIST all measuring's in a specific timespan ORDERED BY timestamp ASC/DESC

o   Requires ALLOW FILTERING

o   Won't be sorted

*         LIST all measuring's for a specific sensor ORDERED BY timestamp ASC/DESC

o   Sorted result. OK.

*         And stuff the future will bring which I simply don't know now.


So in order to query Cassandra for measuring's in a specific timestamp I can't find a solid
solution. My first idea was:

*         Add a column sequence which can be used to bundle a set of measuring's

DROP TABLE sensors;
CREATE TABLE sensors (
    timestamp BIGINT,
    name VARCHAR,
    value VARCHAR,
    unit VARCHAR,
    sequence INT,
    PRIMARY KEY (sequence, timestamp)
)
WITH gc_grace_seconds = 0
AND CLUSTERING ORDER BY (timestamp DESC);


o   I won't need to measure the entrance twice

o   I can query for a timespan as long as the timespan is within a sequence.

?  But when I query a timespan containing more than a single sequence, then the result set
is not correct sorted again

sequence timestamp           name              value unit
123   2017-07-24 14-11-22     entrance-b        20    Celsius
123   2017-07-24 14-11-15     bed-room          23    Celsius
123   2017-07-24 14-11-07     bath-room         22    Celsius
123   2017-07-24 14-11-04     living-room       24    Celsius
123   2017-07-24 14-11-00     entrance-a        20    Celsius
124   2017-07-24 15-11-22     entrance-b        22    Celsius
124   2017-07-24 15-11-15     bed-room          25    Celsius
124   2017-07-24 15-11-07     bath-room         24    Celsius
124   2017-07-24 15-11-04     living-room       26    Celsius
124   2017-07-24 15-11-00     entrance-a        22    Celsius


o   Besides: it's not recommended to use a "dummy" column especially not as primary or clustering
key.

How to solve this problem?
I believe, I can't be the only one who got this requirement. Imho "Sort it on the client-side"
can't be the solution. As soon as data gets bigger we simply can't "just" sort on a client
side.
So my next idea was to use the table as overall data storage and create another table and
periodically transfer data from the main to the child table. But I believe I'll get the same
problem because Cassandra simply don't sort as an RDBMS. So here must be an idea behind the
philosophy of Cassandra.

Can anyone help me out?

Best regards
Mike Wenzel


(1)    https://www.datastax.com/dev/blog/we-shall-have-order

Mime
View raw message