datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Willem (JIRA)" <>
Subject [jira] [Commented] (DATAFU-91) pig version of HyperLogLog estimator should be Algebraic and use combiners
Date Thu, 11 Jun 2015 09:00:15 GMT


Jan Willem commented on DATAFU-91:

I am under the impression that the type output by both initial and intermediate should be
the same. Looking at how AVG is implemented (,
or the udf manual on the wiki ( search for algebraic),
they return a tuple containing the combined information.

I'm probably just obsessing about the type cast, but you could get around it by using a tagged
union: a tuple of two, with the first element indicating whether it's a Long in the second
field, or rather a serialized HyperLogLogPlus. So: (1, <longvalue>) or (2, <HyperLogLogPlus>).
You could also go for the first value containing the number of items, and as a special case
have the 1 case contain just a long value: (1, <longvalue>) or (<value larger than
1>, <HyperLogLogValue>).

It would only add a little to the size, and get rid of the instanceof.

It's just a matter of opinion, I guess.

> pig version of HyperLogLog estimator should be Algebraic and use combiners
> --------------------------------------------------------------------------
>                 Key: DATAFU-91
>                 URL:
>             Project: DataFu
>          Issue Type: Bug
>    Affects Versions: 1.3.0
>            Reporter: Ido Hadanny
>            Assignee: Ido Hadanny
>            Priority: Minor
>             Fix For: 1.3.0
>         Attachments: hyper-log-log-algebraic-3.diff, hyper-log-log-algebraic.diff, hyper-log-log-algebraic.diff
> Matt: I don't remember if there was a particular reason I didn't implement this as AlgebraicEvalFunc.
It seems like it could be. I believe the Java MapReduce version leverages the combiner. If
you want to try making this Algebraic we would be happy to accept a patch :)

This message was sent by Atlassian JIRA

View raw message