Return-Path: Delivered-To: apmail-hadoop-pig-dev-archive@www.apache.org Received: (qmail 67160 invoked from network); 3 Sep 2009 17:32:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 3 Sep 2009 17:32:21 -0000 Received: (qmail 19189 invoked by uid 500); 3 Sep 2009 17:32:21 -0000 Delivered-To: apmail-hadoop-pig-dev-archive@hadoop.apache.org Received: (qmail 19174 invoked by uid 500); 3 Sep 2009 17:32:21 -0000 Mailing-List: contact pig-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: pig-dev@hadoop.apache.org Delivered-To: mailing list pig-dev@hadoop.apache.org Received: (qmail 19164 invoked by uid 99); 3 Sep 2009 17:32:21 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 17:32:21 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 03 Sep 2009 17:32:18 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 74F90234C004 for ; Thu, 3 Sep 2009 10:31:57 -0700 (PDT) Message-ID: <2064095509.1251999117463.JavaMail.jira@brutus> Date: Thu, 3 Sep 2009 10:31:57 -0700 (PDT) From: "Pradeep Kamath (JIRA)" To: pig-dev@hadoop.apache.org Subject: [jira] Updated: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values In-Reply-To: <72356819.1230140205135.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-578: ------------------------------- Fix Version/s: 0.4.0 Assignee: Pradeep Kamath Status: Patch Available (was: Open) Attached patch which will enable left, right and full outer joins in pig. The syntax will be: c = join a by $0 [left|right|full] [outer], b by $0 A few points to note about the syntax which closely adheres the SQL standard: 1) The keyword "outer" is optional 2) These outer joins will only work provided the relations which would need to produce nulls in the case of non matching keys has a schema 3) Outer joins will only work for two-way joins - To do a multi-way outer join, users will need to do it multiple 2-way outer join statements The changes are mostly in LogToPhyTranslationVisitor to handle these types of joins when translating LOJoin. An extra Bincond which checks if the bag is empty and introduces extra nulls is introduced in the physical plan. The other area of change is the parser to enable above syntax. > join ... outer, ... outer semantics are a no-ops, should produce corresponding null values > ------------------------------------------------------------------------------------------ > > Key: PIG-578 > URL: https://issues.apache.org/jira/browse/PIG-578 > Project: Pig > Issue Type: Improvement > Components: impl > Affects Versions: 0.2.0 > Reporter: David Ciemiewicz > Assignee: Pradeep Kamath > Fix For: 0.4.0 > > Attachments: PIG-578.patch > > > Currently using the "OUTER" modifier in the JOIN statement is a no-op. The resuls of JOIN are always an INNER join. Now that the Pig types branch supports null values proper, the semantics of JOIN ... OUTER, ... OUTER should be corrected to do proper outer joins and populating the corresponding empty values with nulls. > Here's the example: > A = load 'a.txt' using PigStorage() as ( comment, value ); > B = load 'b.txt' using PigStorage() as ( comment, value ); > -- > -- OUTER clause is ignored in JOIN statement and does not populat tuple with > -- null values as it should. Otherwise OUTER is a meaningless no-op modifier. > -- > ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer; > describe ABOuterJoin; > dump ABOuterJoin; > The file a contains: > a-only 1 > ab-both 2 > The file b contains: > ab-both 2 > b-only 3 > When you execute the script today, the dump results are: > (ab-both,2,ab-both,2) > The expected dump results should be: > (a-only,1,,) > (ab-both,2,ab-both,2) > (,,b-only,3) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.