datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ido Hadanny <ido.hada...@gmail.com>
Subject why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?
Date Sat, 07 Mar 2015 20:11:13 GMT
data.fu has a nice implementation of HyperLogLog for estimating cardinality
here
<https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>

However, it's implemented as Accumulator which means it will run only at
the reducer and not in the combiner (but it will never load the entire set
into memory as in normal EvalFunc). Why couldn't data.fu implement it as
Algebraic - and fill the registers at every combiner, then merge and reduce
the result? Am I missing something here?
also available here:
http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic

thanks!


-- 
Sent from my androido

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message