incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Oleg Dulin <oleg.du...@gmail.com>
Subject Data aggregation -- help me design a solution
Date Tue, 21 Aug 2012 20:08:18 GMT
Here are my requirements.

We use Cassandra.

I get millions of invoice line items into the system. As I load them I 
need to build up some data structures.

* Invoice line items by invoice id (each line item has an invoice id on 
it ), with total dollar value
* Invoice line items by customer id , with total dollar value
* Invoice line items by territory, with total dollar value

In all of those cases, what we want is to see the total by a given 
attribute, that's all there is to it.

Line items may change daily, i.e. a territory may change or they may 
correct the values. In this case I need to update the aggregations 
accordingly.

Here are my ideas:

- I can use counters and store the data in buckets
- I can just store the data in buckets and do the math in Java

In both cases the challenge is that the items can be updated. Which 
means I need to look up a current version of an item and decide how to 
proceed. That puts a huge performance penalty on the application (# of 
line items we receive is in the millions and we need to process them in 
a timely fashion).

Help me out here -- any ideas on how I could design this in Cassandra ?


Regards,
Oleg



Mime
View raw message