cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Wille <>
Subject Getting the counters with the highest values
Date Sun, 23 Nov 2014 15:41:36 GMT
I’m working on moving a bunch of counters out of our relational database to Cassandra. For
the most part, Cassandra is a very nice fit, except for one feature on our website. We manage
a time series of view counts for each document, and display a list of the most popular documents
in the last seven days. This seems like a pretty strong anti-pattern for Cassandra, but also
seems like something a lot of people would want to do. If you’re keeping counters, its pretty
likely that you’d want to know which ones have the highest counts. 

Here’s what I came up with to implement this feature. Create a counter table with primary
key (doc_id, day) and a single counter. Whenever a document is viewed, increment the counter
for the document for today and the previous six days. Sometime after midnight each day, compile
the counters into a table with primary key (day, count, doc_id) and no additional columns.
For each partition in the counter table, I would sum up the counters, delete any counters
that are over a week old, and put the sum into the second table with day = today. When I query
the table, i would ask for data where day = yesterday. During the compilation process, I would
delete old partitions. In theory I’d only need two partitions. One that is being built,
and one for querying.

I’d be interested to hear critiques on this strategy, as well as hearing how other people
have implemented a "most-popular" feature using Cassandra counters.


View raw message