cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Milind Parikh <milindpar...@gmail.com>
Subject Re: Data aggregation -- help me design a solution
Date Tue, 21 Aug 2012 20:50:45 GMT
1. Assuming that the majorirty of the line items are new and

2. The lookup of an existing line-item will dictate the performance of the
system  because reads are slower than writes in C*.

3. Assuming that you are using counters in C*

Therefore eliminate that problem by implementing a bloom filter or similar
structure (stable bloom filter) to figure out whether you actually need to
go to C* at all FOR READING of existing line item.

IF YOU NEED TO GO TO C* FOR READS, handle that event (act of getting an
line-item that has already existed) in a seperate set of threads; DECRing
the chosen counters for the previous value of the invoice line-tems


HTH
Regards
Milind



On Tue, Aug 21, 2012 at 1:08 PM, Oleg Dulin <oleg.dulin@gmail.com> wrote:

> Here are my requirements.
>
> We use Cassandra.
>
> I get millions of invoice line items into the system. As I load them I
> need to build up some data structures.
>
> * Invoice line items by invoice id (each line item has an invoice id on it
> ), with total dollar value
> * Invoice line items by customer id , with total dollar value
> * Invoice line items by territory, with total dollar value
>
> In all of those cases, what we want is to see the total by a given
> attribute, that's all there is to it.
>
> Line items may change daily, i.e. a territory may change or they may
> correct the values. In this case I need to update the aggregations
> accordingly.
>
> Here are my ideas:
>
> - I can use counters and store the data in buckets
> - I can just store the data in buckets and do the math in Java
>
> In both cases the challenge is that the items can be updated. Which means
> I need to look up a current version of an item and decide how to proceed.
> That puts a huge performance penalty on the application (# of line items we
> receive is in the millions and we need to process them in a timely fashion).
>
> Help me out here -- any ideas on how I could design this in Cassandra ?
>
>
> Regards,
> Oleg
>
>
>

Mime
View raw message