On 06/13/2012 01:00 PM, Yuki Morishita wrote:
> The above implementation and most of the other ones (including stream-lib) implement
the optimized version of the algorithm which counts up to 10^9, so may need some work.
>
> Other alternative is self-learning bitmap (http://ect.bell-labs.com/who/aychen/sbitmap4p.pdf)
which, in my understanding, is more memory efficient when counting small values.
The closest we could get to a one-size fits all would probably be an
adaptive counting scheme that uses linear counting (or self-learning
bitmap, didn't know about that one!) for small expected cardinalities
and a LogLog variant for higher ones. It's more choices to make, but
choosing between "not too big" and "really really big" doesn't seem like
an unreasonable burden to me.
|