hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Ciemiewicz (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-1400) add option for null field JOIN semantics
Date Fri, 30 Apr 2010 18:58:54 GMT
add option for null field JOIN semantics
----------------------------------------

                 Key: PIG-1400
                 URL: https://issues.apache.org/jira/browse/PIG-1400
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.6.0
            Reporter: David Ciemiewicz


Currently JOIN supports SQL semantics for joining null values in fields - they aren't matched.

However, GROUP ... and COGROUP ... semantics DO match on null values in fields.

This violated the principle of least astonishment for me - I expected JOIN on null value fields
to work.

As a work around, I must now go through ALL of my code to convert chararray null values to
empty strings to get the JOIN to work appropriately.

{code}
A = foreach A generate
    ((a is not null) ? a : '') as a,
    ((b is not null) ? b : '') as b,
    ...
{code}

This does not really a satisfactory work around.


My preference is that JOIN support an option (ala FULL, LEFT, RIGHT, OUTER) that directs JOIN
to support null match join semantics just like COGROUP does.

Something like:

{code}
AB = JOIN A by ( key, subkey ) FULL OUTER MATCHNULLS, B by ( key, subkey );
{code}

Don't know if it should be called JOIN_NULLS, MATCHNULLS, NULLS, NULLSEMANTICS, what have
you.

I just think it would be much cleaner for the end user to be able get these semantics.

We might also consider being explicit about the SQL null semantics by adding the option SQLNULLS
or NONULLMATCH.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message