pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT
Date Thu, 03 Mar 2011 01:17:37 GMT

     [ https://issues.apache.org/jira/browse/PIG-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Olga Natkovich updated PIG-1660:
--------------------------------

    Fix Version/s: 0.10

> Consider passing result of COUNT/COUNT_STAR to LIMIT 
> -----------------------------------------------------
>
>                 Key: PIG-1660
>                 URL: https://issues.apache.org/jira/browse/PIG-1660
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: 0.7.0
>            Reporter: Viraj Bhat
>             Fix For: 0.10
>
>
> In realistic scenarios we need to split a dataset into segments by using LIMIT, and like
to achieve that goal within the same pig script. Here is a case:
> {code}
> A = load '$DATA' using PigStorage(',') as (id, pvs);
> B = group A by ALL;
> C = foreach B generate COUNT_STAR(A) as row_cnt;
> -- get the low 50% segment
> D = order A by pvs;
> E = limit D (C.row_cnt * 0.2);
> store E in '$Eoutput';
> -- get the high 20% segment
> F = order A by pvs DESC;
> G = limit F (C.row_cnt * 0.2);
> store G in '$Goutput';
> {code}
> Since LIMIT only accepts constants, we have to split the operation to two steps in order
to pass in the constants for the LIMIT statements. Please consider bringing this feature in
so the processing can be more efficient.
> Viraj

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message