datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hayes <matthew.terence.ha...@gmail.com>
Subject Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?
Date Mon, 27 Apr 2015 15:02:53 GMT
Great thanks :) Please file a JIRA and attach the patch there.

-Matt

> On Apr 27, 2015, at 6:26 AM, Ido Hadanny <ido.hadanny@gmail.com> wrote:
> 
> Hey guys, 
> patch is attached + tested on unit-tests + We're testing it on a 1000-nodes real hadoop
cluster as we speak.  
> Do you want us to create a jira issue for this, or is this good enough?
> Thanks, Ilia and Ido
> 
>> On 7 March 2015 at 23:09, Matthew Hayes <matthew.terence.hayes@gmail.com> wrote:
>> I don't remember if there was a particular reason I didn't implement this as AlgebraicEvalFunc.
It seems like it could be. I believe the Java MapReduce version leverages the combiner. If
you want to try making this Algebraic we would be happy to accept a patch :)
>> 
>> -Matt
>> 
>> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <ido.hadanny@gmail.com> wrote:
>> >
>> > data.fu has a nice implementation of HyperLogLog for estimating cardinality
>> > here
>> > <https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java>
>> >
>> > However, it's implemented as Accumulator which means it will run only at
>> > the reducer and not in the combiner (but it will never load the entire set
>> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as
>> > Algebraic - and fill the registers at every combiner, then merge and reduce
>> > the result? Am I missing something here?
>> > also available here:
>> > http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
>> >
>> > thanks!
>> >
>> >
>> > --
>> > Sent from my androido
> 
> 
> 
> -- 
> Sent from my androido
> <hyper-log-log-algebraic.diff>

Mime
  • Unnamed multipart/alternative (inline, 7-Bit, 0 bytes)
View raw message