cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From graham sanderson <>
Subject Re: best practices for time-series data with massive amounts of records
Date Fri, 06 Mar 2015 23:06:20 GMT
Note that using static column(s) for the “head” value, and trailing TTLed values behind
is something we’re considering. Note this is especially nice if your head state includes
say a map which is updated by small deltas (individual keys)

We have not yet studied the effect of static columns on say DTCS

> On Mar 6, 2015, at 4:42 PM, Clint Kelly <> wrote:
> Hi all,
> Thanks for the responses, this was very helpful.
> I don't know yet what the distribution of clicks and users will be, but I expect to see
a few users with an enormous amount of interactions and most users having very few.  The idea
of doing some additional manual partitioning, and then maintaining another table that contains
the "head" partition for each user makes sense, although it would add additional latency when
we want to get say the most recent 1000 interactions for a given user (which is something
that we have to do sometimes for applications with tight SLAs).
> FWIW I doubt that any users will have so many interactions that they exceed what we could
reasonably put in a row, but I wanted to have a strategy to deal with this.
> Having a nice design pattern in Cassandra for maintaining a row with the N-most-recent
interactions would also solve this reasonably well, but I don't know of any way to implement
that without running batch jobs that periodically clean out data (which might be okay).
> Best regards,
> Clint
> On Tue, Mar 3, 2015 at 8:10 AM, mck < <>>
> > Here "partition" is a random digit from 0 to (N*M)
> > where N=nodes in cluster, and M=arbitrary number.
> Hopefully it was obvious, but here (unless you've got hot partitions),
> you don't need N.
> ~mck

View raw message