hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Olga Natkovich (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-361) JOIN and cogroup should handle NULLs correctly
Date Thu, 04 Sep 2008 21:59:44 GMT

    [ https://issues.apache.org/jira/browse/PIG-361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628477#action_12628477
] 

Olga Natkovich commented on PIG-361:
------------------------------------

After having further discussion, here is what I think is the right thing to do:

(1) Cogroup distinguishes between NULL keys from different relations by creating separate
records

A = load ...
B = load ...
C = congroup A by $0, B by $0;
...

Assuming that both A and B contain null values in the key column, C would look as follows:

{
....
NULL,  {.....}, {}
NULL, {}, {...}
....
}

The first record corresponds to all records of A with NULL key and the second with record
of B with empty key.

(2) This is consistent with SQL semantics that NULLs are not the same. It will make JOIN work
as is and also outer join expressed as COGROUP + FOREACH with Bincond work as with earlier
versions.

(3) The required work is to add relation id to the comparison function. Join optimization
already does that. We will try to piggyback this issue onto join optimization

> JOIN and cogroup should handle NULLs correctly
> ----------------------------------------------
>
>                 Key: PIG-361
>                 URL: https://issues.apache.org/jira/browse/PIG-361
>             Project: Pig
>          Issue Type: Sub-task
>    Affects Versions: types_branch
>            Reporter: Pradeep Kamath
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>
>
> JOIN should follow SQL semantics .i.e if the join key is a null or part of the join key
is null in the first table, it should not join with similar keys in the second table.
> Cogroup should coalesce all NULL key rows into one group.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message