pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Adam Szita (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-5180) MergeSparseJoin fails with Spark exec type
Date Fri, 10 Mar 2017 21:40:04 GMT

    [ https://issues.apache.org/jira/browse/PIG-5180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905742#comment-15905742
] 

Adam Szita commented on PIG-5180:
---------------------------------

For the sparse merge join we incorporate IndexedStorage from the piggybank library. This implements
IndexableLoadFunc.
In SparkCompiler#visitMergeJoin we don't call setIndexFile() on the POMergeJoin instance if
the corresponding load func implements IndexableLoadFunc. Thats why we end up trying to create
a Path object from a null String.
In these cases we don't need to replicate an index file, since it is already stored that way
on the HDFS, so a simple null-check will take care of this.
[~kellyzly] can you take a look on [^PIG-5180.0.patch]?

> MergeSparseJoin fails with Spark exec type
> ------------------------------------------
>
>                 Key: PIG-5180
>                 URL: https://issues.apache.org/jira/browse/PIG-5180
>             Project: Pig
>          Issue Type: Sub-task
>          Components: spark
>            Reporter: Adam Szita
>            Assignee: Adam Szita
>             Fix For: spark-branch
>
>         Attachments: PIG-5180.0.patch
>
>
> MergeSparseJoin 1 to 6 all fail due to following exception being thrown on the frontend
side:
> {code}
> Caused by: java.lang.IllegalArgumentException: Can not create a Path from a null string
> 	at org.apache.hadoop.fs.Path.checkPathArg(Path.java:122)
> 	at org.apache.hadoop.fs.Path.<init>(Path.java:134)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.setReplicationForMergeJoin(JobGraphBuilder.java:126)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.JobGraphBuilder.visitSparkOp(JobGraphBuilder.java:105)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:140)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.plan.SparkOperator.visit(SparkOperator.java:37)
> 	at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:87)
> 	at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:46)
> 	at org.apache.pig.backend.hadoop.executionengine.spark.SparkLauncher.launchPig(SparkLauncher.java:224)
> 	at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:290)
> 	... 33 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message