My ColumnFamily is defined as follows:
CREATE TABLE UserProfileHistory(
timestamp bigint, -- millis since epoch
data text, -- JSON
PRIMARY KEY (username, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);
Each insert on the username adds to the wide row. The most recent profile history being able to be retrieved by
SELECT * FROM UserProfileHistory WHERE username=:username LIMIT 1;
For some reporting needs I need to fetch the entire history, and I need to do it in ASC order instead of DESC.
One option is to do the sorting in code, collect N results, sort on the timestamps accordingly. Given the row is of N length, that could start to put an undo memory burden in my application layer, and I would like to avoid that if possible opting instead for Cassandra to perform the work.
So I am leaning towards this option:
2) min timestamp seek + ORDER BY
To start the process my initial timestamp would be 01-01-1970T12:00:00+0000 (assume that is in milliseconds, aka 0) I would then issue my query:
SELECT * FROM UserProfileHistory WHERE username=:username AND timestamp > :milliseconds ORDER BY timestamp ASC LIMIT 100
Once I have those initial results I would just pick my last timestamp from the result set and + 1 on it and run the query again until I received 0 results.
The CQL works and returns my results as I expect. This will probably only be run once every 24 hours, maybe every 12 hours; point being, not often.
Am I setting myself up for a disaster down the line?