pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel Dai (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-3875) Fix MergeJoin_8 failure
Date Fri, 11 Apr 2014 02:34:15 GMT

    [ https://issues.apache.org/jira/browse/PIG-3875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13966149#comment-13966149
] 

Daniel Dai commented on PIG-3875:
---------------------------------

As per POMergeJoin document, "This join doesn't support outer join". This issue only manifest
in tez, in MR, endOfRecordMark is set to STATUS_EOP, only in Tez, it is set to STATUS_NULL
which cause the problem (See POMergeJoin.setEndOfRecordMark).

> Fix MergeJoin_8 failure
> -----------------------
>
>                 Key: PIG-3875
>                 URL: https://issues.apache.org/jira/browse/PIG-3875
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Daniel Dai
>            Assignee: Daniel Dai
>             Fix For: tez-branch
>
>         Attachments: PIG-3875-1.patch
>
>
> Here is the exception I get:
> java.lang.NullPointerException
> 	at java.lang.String.compareTo(String.java:1167)
> 	at java.lang.String.compareTo(String.java:92)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMergeJoin.getNextTuple(POMergeJoin.java:489)
> 	at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:300)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.POStoreTez.getNextTuple(POStoreTez.java:90)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.runPipeline(PigProcessor.java:231)
> 	at org.apache.pig.backend.hadoop.executionengine.tez.PigProcessor.run(PigProcessor.java:155)
> 	at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:307)
> 	at org.apache.hadoop.mapred.YarnTezDagChild$5.run(YarnTezDagChild.java:562)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:394)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1495)
> 	at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:551)
> The problem is after exhausted all the records, we return a STATUS_NULL and keep the
pipeline running, and eventually result a NPE.
> The patch fix the issue by outputing EOP when everything is done in POMergeJoin. Previously
we did wrong, however, test do pass before PIG-3568 (didn't spend time to figure out why).




--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message