hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-578) join ... outer, ... outer semantics are a no-ops, should produce corresponding null values
Date Thu, 03 Sep 2009 17:26:57 GMT

     [ https://issues.apache.org/jira/browse/PIG-578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Pradeep Kamath updated PIG-578:
-------------------------------

    Attachment: PIG-578.patch

> join ... outer, ... outer semantics are a no-ops, should produce corresponding null values
> ------------------------------------------------------------------------------------------
>
>                 Key: PIG-578
>                 URL: https://issues.apache.org/jira/browse/PIG-578
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.2.0
>            Reporter: David Ciemiewicz
>         Attachments: PIG-578.patch
>
>
> Currently using the "OUTER" modifier in the JOIN statement is a no-op.  The resuls of
JOIN are always an INNER join.  Now that the Pig types branch supports null values proper,
the semantics of JOIN ... OUTER, ... OUTER should be corrected to do proper outer joins and
populating the corresponding empty values with nulls.
> Here's the example:
> A = load 'a.txt' using PigStorage() as ( comment, value );
> B = load 'b.txt' using PigStorage() as ( comment, value );
> --
> -- OUTER clause is ignored in JOIN statement and does not populat tuple with
> -- null values as it should. Otherwise OUTER is a meaningless no-op modifier.
> --
> ABOuterJoin = join A by ( comment ) outer, B by ( comment ) outer;
> describe ABOuterJoin;
> dump ABOuterJoin;
> The file a contains:
> a-only  1
> ab-both 2
> The file b contains:
> ab-both 2
> b-only  3
> When you execute the script today, the dump results are:
> (ab-both,2,ab-both,2)
> The expected dump results should be:
> (a-only,1,,)
> (ab-both,2,ab-both,2)
> (,,b-only,3)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message