Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 58634 invoked from network); 6 Mar 2009 05:24:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 6 Mar 2009 05:24:19 -0000 Received: (qmail 4964 invoked by uid 500); 6 Mar 2009 05:24:19 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 4940 invoked by uid 500); 6 Mar 2009 05:24:19 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 4929 invoked by uid 99); 6 Mar 2009 05:24:19 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Mar 2009 21:24:19 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Mar 2009 05:24:17 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 365DC234C498 for ; Thu, 5 Mar 2009 21:23:56 -0800 (PST) Message-ID: <591011645.1236317036208.JavaMail.jira@brutus> Date: Thu, 5 Mar 2009 21:23:56 -0800 (PST) From: "Gunther Hagleitner (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Updated: (PIG-627) PERFORMANCE: multi-query optimization In-Reply-To: <1915989148.1232498399494.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gunther Hagleitner updated PIG-627: ----------------------------------- Attachment: file_cmds-0305.patch This patch is for the multi query branch again. It mostly fixes the problem with certain commands in the script that require immediate execution (in batch mode). So if you do stuff like: ... store a into 'tmp_foo'; ... rm tmp_foo ... The rm will trigger execution and the file will be there for you to delete, copyToLocal, move, etc. You can also use the "exec" statement without params in a script now, to force execution of what we've seen so far. This patch also contains a minor fix with the computation of progress in MR jobs (which I screwed up in the last patch). > PERFORMANCE: multi-query optimization > ------------------------------------- > > Key: PIG-627 > URL: https://issues.apache.org/jira/browse/PIG-627 > Project: Pig > Issue Type: Improvement > Affects Versions: types_branch > Reporter: Olga Natkovich > Fix For: types_branch > > Attachments: file_cmds-0305.patch, multi-store-0303.patch, multi-store-0304.patch, multiquery_0223.patch, multiquery_0224.patch > > > Currently, if your Pig script contains multiple stores and some shared computation, Pig will execute several independent queries. For instance: > A = load 'data' as (a, b, c); > B = filter A by a > 5; > store B into 'output1'; > C = group B by b; > store C into 'output2'; > This script will result in map-only job that generated output1 followed by a map-reduce job that generated output2. As the resuld data is read, parsed and filetered twice which is unnecessary and costly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.