datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ido Hadanny <ido.hada...@gmail.com>
Subject Re: why is data.fu implementing HyperLogLog as an accumulator and not as algebraic?
Date Mon, 27 Apr 2015 13:26:35 GMT
Hey guys,
patch is attached + tested on unit-tests + We're testing it on a 1000-nodes
real hadoop cluster as we speak.
Do you want us to create a jira issue for this, or is this good enough?
Thanks, Ilia and Ido

On 7 March 2015 at 23:09, Matthew Hayes <matthew.terence.hayes@gmail.com>
wrote:

> I don't remember if there was a particular reason I didn't implement this
> as AlgebraicEvalFunc. It seems like it could be. I believe the Java
> MapReduce version leverages the combiner. If you want to try making this
> Algebraic we would be happy to accept a patch :)
>
> -Matt
>
> > On Mar 7, 2015, at 12:11 PM, Ido Hadanny <ido.hadanny@gmail.com> wrote:
> >
> > data.fu has a nice implementation of HyperLogLog for estimating
> cardinality
> > here
> > <
> https://github.com/apache/incubator-datafu/blob/master/datafu-pig/src/main/java/datafu/pig/stats/HyperLogLogPlusPlus.java
> >
> >
> > However, it's implemented as Accumulator which means it will run only at
> > the reducer and not in the combiner (but it will never load the entire
> set
> > into memory as in normal EvalFunc). Why couldn't data.fu implement it as
> > Algebraic - and fill the registers at every combiner, then merge and
> reduce
> > the result? Am I missing something here?
> > also available here:
> >
> http://stackoverflow.com/questions/28908217/why-is-data-fu-implementing-hyperloglog-as-an-accumulator-and-not-as-algebraic
> >
> > thanks!
> >
> >
> > --
> > Sent from my androido
>



-- 
Sent from my androido

Mime
View raw message