hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-484) PERFORMANCE: streaming data to aggregate functions
Date Tue, 11 Nov 2008 01:25:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Alan Gates updated PIG-484:

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

Patch checked in.  I ran performance tests on large data and saw no significant changes. 
This is fine, as this change is more for scalability than performance.

> PERFORMANCE: streaming data to aggregate functions
> --------------------------------------------------
>                 Key: PIG-484
>                 URL: https://issues.apache.org/jira/browse/PIG-484
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Pradeep Kamath
>             Fix For: types_branch
>         Attachments: PIG-484.patch
> Currently, for queries like
> A = load 'data';
> B = group A by $0;
> C = foreach A generate group, MIN(A.$1), MAX (A.$1)
> The data will be put into the bag before being passed to aggregate functions. This is
unnecessary and inefficient. In this case, data can be just streamed to the functions.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message