hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Viraj Bhat (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1660) Consider passing result of COUNT/COUNT_STAR to LIMIT
Date Fri, 01 Oct 2010 01:20:32 GMT
Consider passing result of COUNT/COUNT_STAR to LIMIT 

                 Key: PIG-1660
                 URL: https://issues.apache.org/jira/browse/PIG-1660
             Project: Pig
          Issue Type: Improvement
    Affects Versions: 0.7.0
            Reporter: Viraj Bhat
             Fix For: 0.9.0

In realistic scenarios we need to split a dataset into segments by using LIMIT, and like to
achieve that goal within the same pig script. Here is a case:

A = load '$DATA' using PigStorage(',') as (id, pvs);
B = group A by ALL;
C = foreach B generate COUNT_STAR(A) as row_cnt;
-- get the low 50% segment
D = order A by pvs;
E = limit D (C.row_cnt * 0.2);
store E in '$Eoutput';
-- get the high 20% segment
F = order A by pvs DESC;
G = limit F (C.row_cnt * 0.2);
store G in '$Goutput';

Since LIMIT only accepts constants, we have to split the operation to two steps in order to
pass in the constants for the LIMIT statements. Please consider bringing this feature in so
the processing can be more efficient.


This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message