hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-627) PERFORMANCE: multi-query optimization
Date Wed, 21 Jan 2009 00:47:59 GMT

    [ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665657#action_12665657

Olga Natkovich commented on PIG-627:

The proposal is as follows:

(1) No changes in interactive mode. Each store or dump starts a job. Interactive mode is not
about efficiency but about ease of use
(2) In batch mode, a connected set of stores/dumps is executed together. Processing continues
on Failures: after the run you can get response that some of the queries in the batch succeeded
and some failed. (We need to figure out how to communicate this information to the users.
Done files is one of the options.)

The nice feature of this approach is that it requires no changes on the part of the user.
After the change is implemented, some queries will just run faster.

> PERFORMANCE: multi-query optimization
> -------------------------------------
>                 Key: PIG-627
>                 URL: https://issues.apache.org/jira/browse/PIG-627
>             Project: Pig
>          Issue Type: Improvement
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>             Fix For: types_branch
> Currently, if your Pig script contains multiple stores and some shared computation, Pig
will execute several independent queries. For instance:
> A = load 'data' as (a, b, c);
> B = filter A by a > 5;
> store B into 'output1';
> C = group B by b;
> store C into 'output2';
> This script will result in map-only job that generated output1 followed by a map-reduce
job that generated output2. As the resuld data is read, parsed and filetered twice which is
unnecessary and costly. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message