cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: rainbird question (why is the 1minute buffer needed?)
Date Sun, 22 May 2011 10:47:18 GMT
 The implementation of distributed counters is  more complicated than your example, there is
a design doc attached to the ticket here

By collapsing some of those +1 increments together at the application level there is less
work for the cluster to do. This can be important when the numbers are big


Aaron Morton
Freelance Cassandra Developer

On 21 May 2011, at 09:04, Yang wrote:

> (sorry if Rainbird is not a topic relevant enough, I'd appreciate if
> someone could point me to a more appropriate venue in that case)
> Rainbird buffers up 1 minute worth of events first before writing to Cassandra.
> it seems that this extra layer of buffering is repetitive, and could
> be avoided : Cassandra's memtable already does buffering, whose
> internal implementation is just
> Map.put(key, CF ) , I guess rainbird does similar things :
> column_to_count = map.get(key); column_to_count++ ; map.put(key,
> column_to_count) ??
> the "++" part is probably already done by the Distributed Counters in
> Cassandra.
> then I guess Rainbird layer exists because it needs to parse an
> incoming event into various attributes that it is interested in: for
> example from an url, we bump up the counts of
> FQDN , domain, path etc, Rainbird does the transformation from
> url--->3 attrs.
> but I guess that transformation might as well be done in the cassandra
> JVM itself, if we could provide some hooks, so that a module
> translates incoming request into
> multiple keys, and bump up their counts. that way we avoid the
> intermediate communication from clients to rainbird,  and rainbird to
> Cassandra. are there some points I'm missing?
> Thanks
> Yang

View raw message