cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Peters <cassan...@softwareprojects.com>
Subject Cassandra 0.8 Counters Inverted Index?
Date Sat, 01 Oct 2011 05:19:21 GMT
Hi,

We're using Cassandra 0.8 counters in production and loving it!

One issue we're running into is we need an efficient mechanism to 
retrieve the "top 100" results, sorted by count values.

We have tens of thousands of counters growing rapidly (one counter per 
each combination of date.source_id).  What we're looking for is, what's 
the best way to retrieve the top 100 "sources" for a given date, without 
having to iterate through all counters created for that date?

Right now to accomplish this, we are managing an inverted index of count 
values.  This is very inefficient and kills our write performance, 
because after every counter-increment, we have to read its value and 
store it into an inverted index that looks like this:

Key,   CounterName
000005 2011-10-01.source1
000009 2011-10-01.source2
000010 2011-10-01.source3

If source2 just generated 100 "hits", we need to delete the row with the 
key of "000009" from the inverted index and insert a new one with the 
new counter value for source2:

Key,   CounterName
000005 2011-10-01.source1
000010 2011-10-01.source3
000109 2011-10-01.source2

The additional reads and deletes are killing our performance.

Any one has any ideas about a more efficient way to utilize counters and 
support "top 100" results?

Looking forward to any ideas and feedback you can share.


Thanks,
Mike Peters

Mime
View raw message