pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1836) Accumulator like interface should be used with Pig operators after (co)group in certain cases
Date Wed, 02 Mar 2011 01:46:37 GMT

     [ https://issues.apache.org/jira/browse/PIG-1836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1836:
--------------------------------

    Fix Version/s: 0.10

> Accumulator like interface should be used with Pig operators after (co)group in certain
cases
> ---------------------------------------------------------------------------------------------
>
>                 Key: PIG-1836
>                 URL: https://issues.apache.org/jira/browse/PIG-1836
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Alan Gates
>             Fix For: 0.10
>
>
> There are a number of cases where people (co)group their data, and then pass it to an
operator other than foreach with a UDF, but where an accumulator like interface would still
make sense.  A few examples:
> {code}
> C = group B by $0;
> D = foreach C generate flatten(B);
> ...
> C = group B by $0;
> D = stream C through 'script.py';
> ...
> C = group B by $0;
> store C into 'output';
> {code}
> In all these cases the following operator does not require all the data to be held in
memory at once.  There may be others beyond this.  Changing this part of the pipeline would
greatly speed these types of queries and make them less likely to die with out of memory errors.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message