Return-Path: Delivered-To: apmail-pig-dev-archive@www.apache.org Received: (qmail 82436 invoked from network); 9 Oct 2010 00:00:56 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Oct 2010 00:00:56 -0000 Received: (qmail 6837 invoked by uid 500); 9 Oct 2010 00:00:56 -0000 Delivered-To: apmail-pig-dev-archive@pig.apache.org Received: (qmail 6792 invoked by uid 500); 9 Oct 2010 00:00:56 -0000 Mailing-List: contact dev-help@pig.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@pig.apache.org Delivered-To: mailing list dev@pig.apache.org Received: (qmail 6784 invoked by uid 500); 9 Oct 2010 00:00:56 -0000 Delivered-To: apmail-hadoop-pig-dev@hadoop.apache.org Received: (qmail 6781 invoked by uid 99); 9 Oct 2010 00:00:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Oct 2010 00:00:56 +0000 X-ASF-Spam-Status: No, hits=-1996.4 required=10.0 tests=ALL_TRUSTED,FS_REPLICA X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 09 Oct 2010 00:00:53 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9900ViY002119 for ; Sat, 9 Oct 2010 00:00:31 GMT Message-ID: <13354483.48541286582431293.JavaMail.jira@thor> Date: Fri, 8 Oct 2010 20:00:31 -0400 (EDT) From: "Thejas M Nair (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Updated: (PIG-1672) order of relations in replicated join gets switched in a query where first relation has two mergeable foreach statements In-Reply-To: <16728381.29291286477551375.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair updated PIG-1672: ------------------------------- Status: Patch Available (was: Open) Unit tests and test-patch have passed with PIG-1672.2.patch . > order of relations in replicated join gets switched in a query where first relation has two mergeable foreach statements > ------------------------------------------------------------------------------------------------------------------------ > > Key: PIG-1672 > URL: https://issues.apache.org/jira/browse/PIG-1672 > Project: Pig > Issue Type: Bug > Affects Versions: 0.8.0 > Reporter: Thejas M Nair > Assignee: Thejas M Nair > Fix For: 0.8.0 > > Attachments: PIG-1672.1.patch, PIG-1672.2.patch > > > The replicated join query was running out of memory because the order of relations got switched during logical plan optimization and it was attempting to load the larger (left) relation into memory. > {code} > cat replj.pig > l1 = load 'x' as (a); > l2 = load 'y' as (b); > l3 = load 'z' as (a1,b1,c1,d1); > f1 = foreach l3 generate a1 as a, b1 as b, c1 as c, d1 as d; > f2 = foreach f1 generate a,b,c; > j1 = join f2 by a, l1 by a using 'replicated'; > j2 = join j1 by b, l2 by b using 'replicated'; > explain j2; > Note that in the MR plan printed below, the Load in the MR job with join operations has 'x' as the input instead of 'z' . > #-------------------------------------------------- > # Map Reduce Plan > #-------------------------------------------------- > MapReduce node scope-30 > Map Plan > Store(file:/tmp/temp101387354/tmp-125684214:org.apache.pig.impl.io.InterStorage) - scope-31 > | > |---l2: Load(file:///Users/tejas/pig-0.8/branch-0.8/y:org.apache.pig.builtin.PigStorage) - scope-17-------- > Global sort: false > ---------------- > MapReduce node scope-27 > Map Plan > j2: Store(fakefile:org.apache.pig.builtin.PigStorage) - scope-26 > | > |---j2: FRJoin[tuple] - scope-20 > | | > | Project[bytearray][1] - scope-18 > | | > | Project[bytearray][0] - scope-19 > | > |---j1: FRJoin[tuple] - scope-11 > | | > | Project[bytearray][0] - scope-9 > | | > | Project[bytearray][0] - scope-10 > | > |---l1: Load(file:///Users/tejas/pig-0.8/branch-0.8/x:org.apache.pig.builtin.PigStorage) - scope-0-------- > Global sort: false > ---------------- > MapReduce node scope-28 > Map Plan > Store(file:/tmp/temp101387354/tmp-890864787:org.apache.pig.impl.io.InterStorage) - scope-29 > | > |---f2: New For Each(false,false,false)[bag] - scope-8 > | | > | Project[bytearray][0] - scope-2 > | | > | Project[bytearray][1] - scope-4 > | | > | Project[bytearray][2] - scope-6 > | > |---l3: Load(file:///Users/tejas/pig-0.8/branch-0.8/z:org.apache.pig.builtin.PigStorage) - scope-1-------- > Global sort: false > ---------------- > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.