hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Pradeep Kamath (JIRA)" <j...@apache.org>
Subject [jira] Created: (PIG-429) Self join wth implicit split has the join output in wrong order
Date Fri, 12 Sep 2008 22:59:44 GMT
Self join wth implicit split has the join output in wrong order
---------------------------------------------------------------

                 Key: PIG-429
                 URL: https://issues.apache.org/jira/browse/PIG-429
             Project: Pig
          Issue Type: Bug
    Affects Versions: types_branch
            Reporter: Pradeep Kamath
             Fix For: types_branch


Query:
{code}
A = load 'st10k' split by 'file';
B = filter A by $1 > 25;
D = join A by $0, B by $0;
dump D;
{code}

In the output the columns from B are projected out first and from A next. On closer examination
of the code, the ImplicitSplitInserter class adds in the split and two splitoutput operators
into the plan and tries the connect the successors of LOad to these. However it does this
by iterating over its successors and disconnecting from them and connecting up the split-splitoutput
to the successors. However the order in which it gets its successors is NOT the same as the
order in which cogroup (join) expects its inputs. Hence the discrepancy. 


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message