cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Guillermo Winkler <>
Subject Re: Data aggregation -- help me design a solution
Date Tue, 21 Aug 2012 21:57:11 GMT

If you have the aggregates in counters you only need to read the current
counter when adding/removing invoice lines.

In this situation you only need to be sure this sequence:

+ Read current counter value
+ Update current value according to newly created/updated lines

Is done safely to avoid messing up the current counter with concurrent

Assuming you don't need to have the counters updated in "real time" you can
also batch the counter update in Java/Redis/Whatever and do the updates in
C* less often.


On Tue, Aug 21, 2012 at 5:08 PM, Oleg Dulin <> wrote:

> Here are my requirements.
> We use Cassandra.
> I get millions of invoice line items into the system. As I load them I
> need to build up some data structures.
> * Invoice line items by invoice id (each line item has an invoice id on it
> ), with total dollar value
> * Invoice line items by customer id , with total dollar value
> * Invoice line items by territory, with total dollar value
> In all of those cases, what we want is to see the total by a given
> attribute, that's all there is to it.
> Line items may change daily, i.e. a territory may change or they may
> correct the values. In this case I need to update the aggregations
> accordingly.
> Here are my ideas:
> - I can use counters and store the data in buckets
> - I can just store the data in buckets and do the math in Java
> In both cases the challenge is that the items can be updated. Which means
> I need to look up a current version of an item and decide how to proceed.
> That puts a huge performance penalty on the application (# of line items we
> receive is in the millions and we need to process them in a timely fashion).
> Help me out here -- any ideas on how I could design this in Cassandra ?
> Regards,
> Oleg

View raw message