pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jie Li (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-2675) Optimization: Remove unnecessary Limit jobs from plan
Date Thu, 28 Jun 2012 18:45:45 GMT

    [ https://issues.apache.org/jira/browse/PIG-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403351#comment-13403351
] 

Jie Li commented on PIG-2675:
-----------------------------

Limit is now always compiled to two jobs. We can optimize at both compile-time and runtime.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
selected = LIMIT data 2;
explain selected;
{code}

For this query, LIMIT is compiled at both the map phase and reduce phase in the 1st job, whose
requestedParallelism is already set to 1, thus we don't need to compile the 2nd job.

{code}
data = LOAD 'queries/1.txt' AS (k, v, x);
grouped = GROUP data BY k;
selected = LIMIT grouped 2;
explain selected;
{code}

For this query, LIMIT is compiled at the reduce phase of the 1st job, therefore we need to
compile a 2nd job, which can be skipped at run-time.

                
> Optimization: Remove unnecessary Limit jobs from plan
> -----------------------------------------------------
>
>                 Key: PIG-2675
>                 URL: https://issues.apache.org/jira/browse/PIG-2675
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Dmitriy V. Ryaboy
>            Assignee: Daniel Dai
>
> LIMIT operator always inserts a limiting single-reducer job after PIG-2652.
> We can optimize this job away when the preceding job only has 1 reducer at run-time.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message