incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Janne Jalkanen <janne.jalka...@ecyrd.com>
Subject Re: Billions of counters
Date Thu, 13 Jun 2013 18:51:30 GMT

Hi!

We have a similar situation of millions of events on millions of items - turns out that this
isn't really a problem, because there tends to be a very strong power -distribution: very
few of the items get a lot of hits, some get some, and the majority gets no hits (though most
of them do get hits every now and then).  So it's basically a sparse multidimensional array,
and turns out that Cassandra is pretty good at storing those.  We just treat a missing counter
column as zero, and add a counter only when necessary.  To avoid I/O, we also do some statistical
sampling for certain counters where we don't need an exact figure.

YMMV, of course, but I'd look at the likelihood of all the products being purchased from the
same location during one week at least once and start the modeling from there. :)

/Janne

On 13 Jun 2013, at 21:19, Darren Smythe <darren1482@gmail.com> wrote:

> We want to precalculate counts for some common metrics for usage. We have events, locations,
products, etc. The problem is we have millions events/day, thousands of locations and millions
of products.
> 
> Were trying to precalculate counts for some common queries like 'how many times was product
X purchased in location Y last week'.
> 
> It seems like we'll end up with trillions of counters for even these basic permutations.
Is this a cause for concern?
> 
> TIA
> 
> -- Darren


Mime
View raw message