The implementation of distributed counters is more complicated than your example, there is
a design doc attached to the ticket here https://issues.apache.org/jira/browse/CASSANDRA-1072
By collapsing some of those +1 increments together at the application level there is less
work for the cluster to do. This can be important when the numbers are big http://blog.twitter.com/2011/03/numbers.html
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 21 May 2011, at 09:04, Yang wrote:
> (sorry if Rainbird is not a topic relevant enough, I'd appreciate if
> someone could point me to a more appropriate venue in that case)
>
>
> Rainbird buffers up 1 minute worth of events first before writing to Cassandra.
>
> it seems that this extra layer of buffering is repetitive, and could
> be avoided : Cassandra's memtable already does buffering, whose
> internal implementation is just
> Map.put(key, CF ) , I guess rainbird does similar things :
> column_to_count = map.get(key); column_to_count++ ; map.put(key,
> column_to_count) ??
> the "++" part is probably already done by the Distributed Counters in
> Cassandra.
> then I guess Rainbird layer exists because it needs to parse an
> incoming event into various attributes that it is interested in: for
> example from an url, we bump up the counts of
> FQDN , domain, path etc, Rainbird does the transformation from
> url--->3 attrs.
>
> but I guess that transformation might as well be done in the cassandra
> JVM itself, if we could provide some hooks, so that a module
> translates incoming request into
> multiple keys, and bump up their counts. that way we avoid the
> intermediate communication from clients to rainbird, and rainbird to
> Cassandra. are there some points I'm missing?
>
> Thanks
> Yang
|