pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "liyunzhang_intel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PIG-4228) SchemaTupleBackend error when working on a Spark 1.1.0 cluster
Date Tue, 28 Oct 2014 07:56:34 GMT

    [ https://issues.apache.org/jira/browse/PIG-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14186541#comment-14186541
] 

liyunzhang_intel commented on PIG-4228:
---------------------------------------

Hi Carlos Balduz,  Can you recheck and give the detail steps to reproduce ?

> SchemaTupleBackend error when working on a Spark 1.1.0 cluster
> --------------------------------------------------------------
>
>                 Key: PIG-4228
>                 URL: https://issues.apache.org/jira/browse/PIG-4228
>             Project: Pig
>          Issue Type: Bug
>          Components: spark
>    Affects Versions: 0.14.0
>         Environment: spark-1.1.0
>            Reporter: Carlos Balduz
>              Labels: spark
>         Attachments: groupby.pig, movies_data.csv
>
>
> Whenever I try to run a script on a Spark cluster, I get the following error:
> ERROR 0: org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in
stage 1.0 failed 4 times, most recent failure: Lost task 2.3 in stage 1.0 (...): java.lang.RuntimeException:
org.apache.pig.backend.executionengine.ExecException: ERROR 0: Error while executing ForEach
at [1-2[-1,-1]]
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:62)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>         scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:29)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.readNext(POOutputConsumerIterator.java:34)
>         org.apache.pig.backend.hadoop.executionengine.spark.converter.POOutputConsumerIterator.hasNext(POOutputConsumerIterator.java:68)
>         scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:41)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>         scala.collection.Iterator$class.foreach(Iterator.scala:727)
>         scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>         org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>         org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>         org.apache.spark.scheduler.Task.run(Task.scala:54)
>         org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>         java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>         java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>         java.lang.Thread.run(Thread.java:745)
> After debugging I have seen that the problem is inside SchemaTupleBackend. Although SparkLauncher
initializes this class, when the job gets sent to the executors this is lost and when POOutputConsumerIterator
tries to fetch the results, SchemaTupleBackend.newSchemaTupleFactory(...) is called, throwing
a RuntimeException.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message