datafu-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matthew Hayes (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DATAFU-116) Make SetIntersect and SetDifference implement Accumulator
Date Tue, 08 Mar 2016 15:12:40 GMT

    [ https://issues.apache.org/jira/browse/DATAFU-116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15185046#comment-15185046
] 

Matthew Hayes commented on DATAFU-116:
--------------------------------------

I don't think an efficient accumulator implementation is possible for these UDFs. We have
no control over how the data from each bag is fed into the accumulate method. You'd be forced
to hold values from the bags in memory, which makes memory usage worse.

> Make SetIntersect and SetDifference implement Accumulator
> ---------------------------------------------------------
>
>                 Key: DATAFU-116
>                 URL: https://issues.apache.org/jira/browse/DATAFU-116
>             Project: DataFu
>          Issue Type: Improvement
>    Affects Versions: 1.3.0
>            Reporter: Eyal Allweil
>
> SetIntersect and SetDifference accept only sorted bags, and the output is always smaller
than the inputs. Therefore an accumulator implementation should be possible and it will improve
memory usage (somewhat) and allow Pig to optimize loops with these operations better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message