hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (PIG-844) PERFORMANCE: streaming data to the UDFs in foreach
Date Mon, 23 Nov 2009 17:26:45 GMT

     [ https://issues.apache.org/jira/browse/PIG-844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Olga Natkovich resolved PIG-844.

accumulate interface took care of this.

> PERFORMANCE: streaming data to the UDFs in foreach
> --------------------------------------------------
>                 Key: PIG-844
>                 URL: https://issues.apache.org/jira/browse/PIG-844
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Olga Natkovich
> Currently, Pig places the data passed to UDFs into a bag. This can cause the process
to use more memory than actually needed as in many cases it would be better to push the data
one tuple at a time to the UDFs.
> For the case where combiner is invoked, this might not be that important; however, for
non-algebraic UDFs as well as other cases where combiner can't be used, this can provide significant
memory improvement.
> Another possible use case is where the data is already grouped going into pig and we
don't need to group it again.
> How this will effect UDF interface needs to be further discussed.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message