pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (PIG-4284) Enable unit test "TestJoin" for spark
Date Fri, 15 May 2015 01:21:00 GMT

     [ https://issues.apache.org/jira/browse/PIG-4284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

liyunzhang_intel updated PIG-4284:
----------------------------------
    Attachment: PIG-4284.patch

in PIG-4284.patch: following changes are made:
1. add IndexKey class in GlobalRearrangeConverter.java

following unit test failures are fixed in this patch:
org.apache.pig.test.TestJoin.testJoinWithMissingFieldsInTuples
org.apache.pig.test.TestJoin.testJoinNullTupleFieldKey
org.apache.pig.test.TestJoin.testDefaultJoin
org.apache.pig.test.TestJoin.testFullOuterJoin
org.apache.pig.test.TestJoin.testJoinTupleFieldKey
org.apache.pig.test.TestJoin.testLeftOuterJoin
org.apache.pig.test.TestJoin.testJoinSchema
org.apache.pig.test.TestJoin.testRightOuterJoin
org.apache.pig.test.TestProjectRange.testRangeCoGroupMixWSchema
org.apache.pig.test.TestProjectRange.testRangeJoinMixWSchema


Let's use an example to explain why these unit tests fail in previous code:
leftJoin.pig
{code}
a = load './a.txt' as (n:chararray, a:int);
b = load './b.txt' as (n:chararray, m:chararray);
c = join a by $0 left outer, b by $0;
d = order c by $1;
store d into './leftJoin.out';
explain d;
{code}

a.txt:
{code}
hello	    1
bye	    2
	    3
{code}

b.txt:
{code}
hello  	world
good	morning
        evening
{code}

Result of spark mode:
{code}
hello	      1	   hello	world
bye	      2		
              3                 evening
{code}

Result of mr mode:
{code}
hello   	 1	  hello	       world
bye	         2
	         3
{code}	
	
The difference between the result in mr and spark mode is because previously  (,3) from table
a  and (,evening) from table b are considered to have same key(NULL).  In SQL semantics, these
two tuples don’t have the same key. This situation is dealed with in PIG-4284.patch.




> Enable unit test "TestJoin" for spark
> -------------------------------------
>
>                 Key: PIG-4284
>                 URL: https://issues.apache.org/jira/browse/PIG-4284
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: liyunzhang_intel
>            Assignee: liyunzhang_intel
>             Fix For: spark-branch
>
>         Attachments: PIG-4284.patch, TEST-org.apache.pig.test.TestJoin.txt
>
>
> error is attached



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message