datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eyal Allweil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-116) Make SetIntersect and SetDifference implement Accumulator
Date Thu, 10 Mar 2016 11:24:40 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189158#comment-15189158
] 

Eyal Allweil commented on DATAFU-116:
-------------------------------------

As far as I know, the behavior you're describing is how Pig deals with UDF's that implement
the Accumulator interface. If the UDF doesn't (if it only extends EvalFunc) the parameters
(including bags) are passed in memory in their entirety. I'm basing this on [this quote from
Programming Pig|http://stackoverflow.com/a/15813789/150992]. That's why I'm suggesting this
change.



> Make SetIntersect and SetDifference implement Accumulator
> ---------------------------------------------------------
>
>                 Key: DATAFU-116
>                 URL: https://issues.apache.org/jira/browse/DATAFU-116
>             Project: DataFu
>          Issue Type: Improvement
>    Affects Versions: 1.3.0
>            Reporter: Eyal Allweil
>
> SetIntersect and SetDifference accept only sorted bags, and the output is always smaller
than the inputs. Therefore an accumulator implementation should be possible and it will improve
memory usage (somewhat) and allow Pig to optimize loops with these operations better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message