cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jens Rantil <jens.ran...@tink.se>
Subject Re: best practices for time-series data with massive amounts of records
Date Tue, 03 Mar 2015 12:32:24 GMT
Hi,

I have not done something similar, however I have some comments:

On Mon, Mar 2, 2015 at 8:47 PM, Clint Kelly <clint.kelly@gmail.com> wrote:

> The downside of this approach is that we can no longer do a simple
> continuous scan to get all of the events for a given user.
>

Sure, but would you really do that real time anyway? :) If you have
billions of events that's not going to scale anyway. Also, if you have
100000 events per bucket. The latency introduced by batching should be
manageable.


> Some users may log lots and lots of interactions every day, while others
> may interact with our application infrequently,
>

This makes another reason to split them up into bucket to make the cluster
partitions more manageble and homogenous.


> so I'd like a quick way to get the most recent interaction for a given
> user.
>

For this you could actually have a second table that stores the
last_time_bucket for a user. Upon event write, you could simply do an
update of the last_time_bucket. You could even have an index of all time
buckets per user if you want.


> Has anyone used different approaches for this problem?
>
> The only thing I can think of is to use the second table schema described
> above, but switch to an order-preserving hashing function, and then
> manually hash the "id" field.  This is essentially what we would do in
> HBase.
>

Like you might already know, this order preserving hashing is _not_
considered best practise in the Cassandra world.

Cheers,
Jens


-- 
Jens Rantil
Backend engineer
Tink AB

Email: jens.rantil@tink.se
Phone: +46 708 84 18 32
Web: www.tink.se

Facebook <https://www.facebook.com/#!/tink.se> Linkedin
<http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_photo&trkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary>
 Twitter <https://twitter.com/tink>

Mime
View raw message