incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Burroughs <chris.burrou...@gmail.com>
Subject Re: Distinct Counter Proposal for Cassandra
Date Fri, 29 Jun 2012 13:27:41 GMT
On 06/13/2012 01:00 PM, Yuki Morishita wrote:
> The above implementation and most of the other ones (including stream-lib) implement
the optimized version of the algorithm which counts up to 10^9, so may need some work.
> 
> Other alternative is self-learning bitmap (http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf)
which, in my understanding, is more memory efficient when counting small values.

The closest we could get to a one-size fits all would probably be an
adaptive counting scheme that uses linear counting (or self-learning
bitmap, didn't know about that one!) for small expected cardinalities
and a LogLog variant for higher ones.  It's more choices to make, but
choosing between "not too big" and "really really big" doesn't seem like
an unreasonable burden to me.

Mime
View raw message