cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ali Akhtar <ali.rac...@gmail.com>
Subject Effective partition key for time series data, which allows range queries?
Date Tue, 28 Mar 2017 00:47:28 GMT
I have a use case where the data for individual users is being tracked, and
every 15 minutes or so, the data for the past 15 minutes is inserted into
the table.

The table schema looks like:
user id, timestamp, foo, bar, etc.

Where foo, bar, etc are the items being tracked, and their values over the
past 15 minutes.

I initially planned to use the user id as the primary key of the table.
But, I realized that this may cause really wide rows ( tracking for 24
hours means 96 records inserted (1 for each 15 min window), over 1 year
this means 36k records per user, over 2 years, 72k, etc).

I know the  limit of wide rows is billions of records, but I've heard that
the practical limit is much lower.

So I considered using a composite primary key: (user, timestamp)

If I'm correct, the above should create a new row for each user & timestamp
logged.

However, will i still be able to do range queries on the timestamp, to e.g
return the data for the last week?

E.g select * from data where user_id = 'foo' and timestamp >= '<1 month
ago>' and timestamp <= '<today>' ?

Mime
View raw message