hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Romain Rigaux (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-420) Limit on nothing functionality
Date Thu, 19 Nov 2009 18:15:39 GMT

    [ https://issues.apache.org/jira/browse/PIG-420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780181#action_12780181
] 

Romain Rigaux commented on PIG-420:
-----------------------------------

We have commands that look like Unix commands (e.g. top-queries) and use Pig scripts below.
These commands have parameters like -limit (e.g. how many results to return) and the user
specifies -limit N where N is an integer. 
This is then simply transformed into a:
{code}
B = LIMIT A $N;
{code}
It would be nice if we could specify -limit * and the compiler removes the statement (in case
users want everything). Currently we use a custom limit UDF filter or LIMIT with Integer.MAX_VALUE/(Long.MAX_VALUE
soon!).


> Limit on nothing functionality
> ------------------------------
>
>                 Key: PIG-420
>                 URL: https://issues.apache.org/jira/browse/PIG-420
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Anand Murugappan
>
> Pig 2.0 implements the limit feature but as a standalone statement. 
> Limit is very useful in debug mode where we could run queries on smaller amount of data
(faster and on fewer nodes) to iron out issues but in the production mode we would like to
run through all the data. It would be good to have a easy "switch" between debug and prod
mode using the limit statement without having to change the underlying code templates. Given
that LIMIT is a separate standalone statement it gets hard to parametrize the code. 
> For instance a query template might look like, 
> A = LOAD '...';
> B = LIMIT A $N;
> C = FOREACH B .... 
> In debug mode, we would like to set the variable $N to 100 but in prod mode we would
like to set it to a 'special value' that would not apply LIMIT and letting us run it on all
the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message