spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sean Owen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18075) UDF doesn't work on non-local spark
Date Wed, 11 Jan 2017 16:05:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818720#comment-15818720
] 

Sean Owen commented on SPARK-18075:
-----------------------------------

Yes, spark-shell is submitted the same way. If you wrote some code that did its work given
an existing SparkContext/SparkSession and then invoked it in the shell, it should be fine.
I think this was about launching a Spark job by running a class directly as if it were any
other program. That also can work, but, may require additional work to accomplish comparable
setup.

> UDF doesn't work on non-local spark
> -----------------------------------
>
>                 Key: SPARK-18075
>                 URL: https://issues.apache.org/jira/browse/SPARK-18075
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 1.6.1, 2.0.0
>            Reporter: Nick Orka
>
> I have the issue with Spark 2.0.0 (spark-2.0.0-bin-hadoop2.7.tar.gz)
> According to this ticket https://issues.apache.org/jira/browse/SPARK-9219 I've made all
spark dependancies with PROVIDED scope. I use 100% same versions of spark in the app as well
as for spark server. 
> Here is my pom:
> {code:title=pom.xml}
> <properties>
>         <maven.compiler.source>1.6</maven.compiler.source>
>         <maven.compiler.target>1.6</maven.compiler.target>
>         <encoding>UTF-8</encoding>
>         <scala.version>2.11.8</scala.version>
>         <spark.version>2.0.0</spark.version>
>         <hadoop.version>2.7.0</hadoop.version>
>     </properties>
>     <dependencies>
>         <!--Spark-->
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-core_2.11</artifactId>
>             <version>${spark.version}</version>
>             <scope>provided</scope>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-sql_2.11</artifactId>
>             <version>${spark.version}</version>
>             <scope>provided</scope>
>         </dependency>
>         <dependency>
>             <groupId>org.apache.spark</groupId>
>             <artifactId>spark-hive_2.11</artifactId>
>             <version>${spark.version}</version>
>             <scope>provided</scope>
>         </dependency>
>     </dependencies>
> {code}
> As you can see all spark dependencies have provided scope
> And this is a code for reproduction:
> {code:title=udfTest.scala}
> import org.apache.spark.sql.types.{StringType, StructField, StructType}
> import org.apache.spark.sql.{Row, SparkSession}
> /**
>   * Created by nborunov on 10/19/16.
>   */
> object udfTest {
>   class Seq extends Serializable {
>     var i = 0
>     def getVal: Int = {
>       i = i + 1
>       i
>     }
>   }
>   def main(args: Array[String]) {
>     val spark = SparkSession
>       .builder()
>             .master("spark://nborunov-mbp.local:7077")
> //      .master("local")
>       .getOrCreate()
>     val rdd = spark.sparkContext.parallelize(Seq(Row("one"), Row("two")))
>     val schema = StructType(Array(StructField("name", StringType)))
>     val df = spark.createDataFrame(rdd, schema)
>     df.show()
>     spark.udf.register("func", (name: String) => name.toUpperCase)
>     import org.apache.spark.sql.functions.expr
>     val newDf = df.withColumn("upperName", expr("func(name)"))
>     newDf.show()
>     val seq = new Seq
>     spark.udf.register("seq", () => seq.getVal)
>     val seqDf = df.withColumn("id", expr("seq()"))
>     seqDf.show()
>     df.createOrReplaceTempView("df")
>     spark.sql("select *, seq() as sql_id from df").show()
>   }
> }
> {code}
> When .master("local") - everything works fine. When .master("spark://...:7077"), it fails
on line:
> {code}
> newDf.show()
> {code}
> The error is exactly the same:
> {code}
> scala> udfTest.main(Array())
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in [jar:file:/Users/nborunov/.m2/repository/org/slf4j/slf4j-log4j12/1.7.16/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in [jar:file:/Users/nborunov/.m2/repository/ch/qos/logback/logback-classic/1.1.7/logback-classic-1.1.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
> SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
> 16/10/19 19:37:52 INFO SparkContext: Running Spark version 2.0.0
> 16/10/19 19:37:52 WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
> 16/10/19 19:37:52 INFO SecurityManager: Changing view acls to: nborunov
> 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls to: nborunov
> 16/10/19 19:37:52 INFO SecurityManager: Changing view acls groups to: 
> 16/10/19 19:37:52 INFO SecurityManager: Changing modify acls groups to: 
> 16/10/19 19:37:52 INFO SecurityManager: SecurityManager: authentication disabled; ui
acls disabled; users  with view permissions: Set(nborunov); groups with view permissions:
Set(); users  with modify permissions: Set(nborunov); groups with modify permissions: Set()
> 16/10/19 19:37:53 INFO Utils: Successfully started service 'sparkDriver' on port 57828.
> 16/10/19 19:37:53 INFO SparkEnv: Registering MapOutputTracker
> 16/10/19 19:37:53 INFO SparkEnv: Registering BlockManagerMaster
> 16/10/19 19:37:53 INFO DiskBlockManager: Created local directory at /private/var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/blockmgr-f2d05423-b7f7-4525-b41e-10dfe2f88264
> 16/10/19 19:37:53 INFO MemoryStore: MemoryStore started with capacity 2004.6 MB
> 16/10/19 19:37:53 INFO SparkEnv: Registering OutputCommitCoordinator
> 16/10/19 19:37:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
> 16/10/19 19:37:54 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://192.168.2.202:4040
> 16/10/19 19:37:54 INFO StandaloneAppClient$ClientEndpoint: Connecting to master spark://nborunov-mbp.local:7077...
> 16/10/19 19:37:54 INFO TransportClientFactory: Successfully created connection to nborunov-mbp.local/192.168.2.202:7077
after 74 ms (0 ms spent in bootstraps)
> 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: Connected to Spark cluster with app
ID app-20161019153755-0017
> 16/10/19 19:37:55 INFO StandaloneAppClient$ClientEndpoint: Executor added: app-20161019153755-0017/0
on worker-20161018232014-192.168.2.202-61437 (192.168.2.202:61437) with 4 cores
> 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: Granted executor ID app-20161019153755-0017/0
on hostPort 192.168.2.202:61437 with 4 cores, 1024.0 MB RAM
> 16/10/19 19:37:55 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService'
on port 57832.
> 16/10/19 19:37:55 INFO NettyBlockTransferService: Server created on 192.168.2.202:57832
> 16/10/19 19:37:55 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver,
192.168.2.202, 57832)
> 16/10/19 19:37:55 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.2.202:57832
with 2004.6 MB RAM, BlockManagerId(driver, 192.168.2.202, 57832)
> 16/10/19 19:37:55 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver,
192.168.2.202, 57832)
> 16/10/19 19:37:55 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20161019153755-0017/0
is now RUNNING
> 16/10/19 19:37:55 INFO StandaloneSchedulerBackend: SchedulerBackend is ready for scheduling
beginning after reached minRegisteredResourcesRatio: 0.0
> 16/10/19 19:37:55 WARN SparkContext: Use an existing SparkContext, some configuration
may not take effect.
> 16/10/19 19:37:56 INFO HiveSharedState: spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir
is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('/user/hive/warehouse').
> 16/10/19 19:37:56 INFO HiveSharedState: Warehouse path is '/user/hive/warehouse'.
> 16/10/19 19:37:58 INFO HiveUtils: Initializing HiveMetastoreConnection version 1.2.1
using Spark classes.
> 16/10/19 19:37:58 INFO deprecation: mapred.max.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.maxsize
> 16/10/19 19:37:58 INFO deprecation: mapred.reduce.tasks.speculative.execution is deprecated.
Instead, use mapreduce.reduce.speculative
> 16/10/19 19:37:58 INFO deprecation: mapred.committer.job.setup.cleanup.needed is deprecated.
Instead, use mapreduce.job.committer.setup.cleanup.needed
> 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size.per.rack is deprecated. Instead,
use mapreduce.input.fileinputformat.split.minsize.per.rack
> 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size is deprecated. Instead, use
mapreduce.input.fileinputformat.split.minsize
> 16/10/19 19:37:58 INFO deprecation: mapred.min.split.size.per.node is deprecated. Instead,
use mapreduce.input.fileinputformat.split.minsize.per.node
> 16/10/19 19:37:58 INFO deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
> 16/10/19 19:37:58 INFO deprecation: mapred.input.dir.recursive is deprecated. Instead,
use mapreduce.input.fileinputformat.input.dir.recursive
> 16/10/19 19:37:59 INFO metastore: Trying to connect to metastore with URI thrift://ip-10-100-102-90.iad.sessionm.com:9083
> 16/10/19 19:37:59 INFO metastore: Connected to metastore.
> 16/10/19 19:38:00 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor
NettyRpcEndpointRef(null) (192.168.2.202:57835) with ID 0
> 16/10/19 19:38:00 INFO BlockManagerMasterEndpoint: Registering block manager 192.168.2.202:57837
with 366.3 MB RAM, BlockManagerId(0, 192.168.2.202, 57837)
> 16/10/19 19:38:01 WARN BlockReaderLocal: The short-circuit local reads feature cannot
be used because libhadoop cannot be loaded.
> 16/10/19 19:38:01 INFO SessionState: Created local directory: /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/e1377cbe-3c79-4a44-b0be-551f2b73b931_resources
> 16/10/19 19:38:01 INFO SessionState: Created HDFS directory: /tmp/hive/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931
> 16/10/19 19:38:01 INFO SessionState: Created local directory: /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931
> 16/10/19 19:38:01 INFO SessionState: Created HDFS directory: /tmp/hive/nborunov/e1377cbe-3c79-4a44-b0be-551f2b73b931/_tmp_space.db
> 16/10/19 19:38:01 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1)
is /user/hive/warehouse
> 16/10/19 19:38:02 INFO SessionState: Created local directory: /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/4cdb5e78-de4b-4919-b490-4f414c129ed1_resources
> 16/10/19 19:38:02 INFO SessionState: Created HDFS directory: /tmp/hive/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1
> 16/10/19 19:38:02 INFO SessionState: Created local directory: /var/folders/hl/2fv6555n2w92272zywwvpbzh0000gq/T/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1
> 16/10/19 19:38:02 INFO SessionState: Created HDFS directory: /tmp/hive/nborunov/4cdb5e78-de4b-4919-b490-4f414c129ed1/_tmp_space.db
> 16/10/19 19:38:02 INFO HiveClientImpl: Warehouse location for Hive client (version 1.2.1)
is /user/hive/warehouse
> 16/10/19 19:38:03 INFO SparkContext: Starting job: show at udfTest.scala:36
> 16/10/19 19:38:03 INFO DAGScheduler: Got job 0 (show at udfTest.scala:36) with 1 output
partitions
> 16/10/19 19:38:03 INFO DAGScheduler: Final stage: ResultStage 0 (show at udfTest.scala:36)
> 16/10/19 19:38:03 INFO DAGScheduler: Parents of final stage: List()
> 16/10/19 19:38:03 INFO DAGScheduler: Missing parents: List()
> 16/10/19 19:38:03 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[3] at
show at udfTest.scala:36), which has no missing parents
> 16/10/19 19:38:03 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated
size 6.9 KB, free 2004.6 MB)
> 16/10/19 19:38:03 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory
(estimated size 3.8 KB, free 2004.6 MB)
> 16/10/19 19:38:03 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.202:57832
(size: 3.8 KB, free: 2004.6 MB)
> 16/10/19 19:38:03 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1012
> 16/10/19 19:38:03 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[3]
at show at udfTest.scala:36)
> 16/10/19 19:38:03 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
> 16/10/19 19:38:04 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, 192.168.2.202,
partition 0, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:04 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 0
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:04 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.2.202:57837
(size: 3.8 KB, free: 366.3 MB)
> 16/10/19 19:38:07 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 3235
ms on 192.168.2.202 (1/1)
> 16/10/19 19:38:07 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed,
from pool 
> 16/10/19 19:38:07 INFO DAGScheduler: ResultStage 0 (show at udfTest.scala:36) finished
in 3.265 s
> 16/10/19 19:38:07 INFO DAGScheduler: Job 0 finished: show at udfTest.scala:36, took 3.629356
s
> 16/10/19 19:38:07 INFO SparkContext: Starting job: show at udfTest.scala:36
> 16/10/19 19:38:07 INFO DAGScheduler: Got job 1 (show at udfTest.scala:36) with 1 output
partitions
> 16/10/19 19:38:07 INFO DAGScheduler: Final stage: ResultStage 1 (show at udfTest.scala:36)
> 16/10/19 19:38:07 INFO DAGScheduler: Parents of final stage: List()
> 16/10/19 19:38:07 INFO DAGScheduler: Missing parents: List()
> 16/10/19 19:38:07 INFO DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[3] at
show at udfTest.scala:36), which has no missing parents
> 16/10/19 19:38:07 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated
size 6.9 KB, free 2004.6 MB)
> 16/10/19 19:38:07 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory
(estimated size 3.8 KB, free 2004.6 MB)
> 16/10/19 19:38:07 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.2.202:57832
(size: 3.8 KB, free: 2004.6 MB)
> 16/10/19 19:38:07 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1012
> 16/10/19 19:38:07 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[3]
at show at udfTest.scala:36)
> 16/10/19 19:38:07 INFO TaskSchedulerImpl: Adding task set 1.0 with 1 tasks
> 16/10/19 19:38:07 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, 192.168.2.202,
partition 1, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:07 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 1
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:07 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 192.168.2.202:57837
(size: 3.8 KB, free: 366.3 MB)
> 16/10/19 19:38:07 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 85 ms
on 192.168.2.202 (1/1)
> 16/10/19 19:38:07 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed,
from pool 
> 16/10/19 19:38:07 INFO DAGScheduler: ResultStage 1 (show at udfTest.scala:36) finished
in 0.087 s
> 16/10/19 19:38:07 INFO DAGScheduler: Job 1 finished: show at udfTest.scala:36, took 0.103358
s
> 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.2.202:57832
in memory (size: 3.8 KB, free: 2004.6 MB)
> 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_0_piece0 on 192.168.2.202:57837
in memory (size: 3.8 KB, free: 366.3 MB)
> 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.2.202:57832
in memory (size: 3.8 KB, free: 2004.6 MB)
> 16/10/19 19:38:07 INFO BlockManagerInfo: Removed broadcast_1_piece0 on 192.168.2.202:57837
in memory (size: 3.8 KB, free: 366.3 MB)
> 16/10/19 19:38:08 INFO CodeGenerator: Code generated in 638.80317 ms
> +----+
> |name|
> +----+
> | one|
> | two|
> +----+
> 16/10/19 19:38:08 INFO SparkSqlParser: Parsing command: func(name)
> 16/10/19 19:38:09 INFO CodeGenerator: Code generated in 51.788495 ms
> 16/10/19 19:38:09 INFO SparkContext: Starting job: show at udfTest.scala:44
> 16/10/19 19:38:09 INFO DAGScheduler: Got job 2 (show at udfTest.scala:44) with 1 output
partitions
> 16/10/19 19:38:09 INFO DAGScheduler: Final stage: ResultStage 2 (show at udfTest.scala:44)
> 16/10/19 19:38:09 INFO DAGScheduler: Parents of final stage: List()
> 16/10/19 19:38:09 INFO DAGScheduler: Missing parents: List()
> 16/10/19 19:38:09 INFO DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[6] at
show at udfTest.scala:44), which has no missing parents
> 16/10/19 19:38:09 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated
size 11.4 KB, free 2004.6 MB)
> 16/10/19 19:38:09 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory
(estimated size 5.7 KB, free 2004.6 MB)
> 16/10/19 19:38:09 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.2.202:57832
(size: 5.7 KB, free: 2004.6 MB)
> 16/10/19 19:38:09 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1012
> 16/10/19 19:38:09 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (MapPartitionsRDD[6]
at show at udfTest.scala:44)
> 16/10/19 19:38:09 INFO TaskSchedulerImpl: Adding task set 2.0 with 1 tasks
> 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, 192.168.2.202,
partition 0, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 2
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:09 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on 192.168.2.202:57837
(size: 5.7 KB, free: 366.3 MB)
> 16/10/19 19:38:09 WARN TaskSetManager: Lost task 0.0 in stage 2.0 (TID 2, 192.168.2.202):
java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD
> 	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
> 	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:85)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.1 in stage 2.0 (TID 3, 192.168.2.202,
partition 0, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 3
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.1 in stage 2.0 (TID 3) on executor
192.168.2.202: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 1]
> 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.2 in stage 2.0 (TID 4, 192.168.2.202,
partition 0, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 4
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.2 in stage 2.0 (TID 4) on executor
192.168.2.202: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 2]
> 16/10/19 19:38:09 INFO TaskSetManager: Starting task 0.3 in stage 2.0 (TID 5, 192.168.2.202,
partition 0, PROCESS_LOCAL, 5381 bytes)
> 16/10/19 19:38:09 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Launching task 5
on executor id: 0 hostname: 192.168.2.202.
> 16/10/19 19:38:09 INFO TaskSetManager: Lost task 0.3 in stage 2.0 (TID 5) on executor
192.168.2.202: java.lang.ClassCastException (cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD) [duplicate 3]
> 16/10/19 19:38:09 ERROR TaskSetManager: Task 0 in stage 2.0 failed 4 times; aborting
job
> 16/10/19 19:38:09 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed,
from pool 
> 16/10/19 19:38:09 INFO TaskSchedulerImpl: Cancelling stage 2
> 16/10/19 19:38:09 INFO DAGScheduler: ResultStage 2 (show at udfTest.scala:44) failed
in 0.354 s
> 16/10/19 19:38:09 INFO DAGScheduler: Job 2 failed: show at udfTest.scala:44, took 0.373604
s
> org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0
failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 5, 192.168.2.202): java.lang.ClassCastException:
cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_
of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
> 	at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
> 	at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
> 	at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
> 	at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
> 	at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
> 	at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
> 	at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
> 	at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
> 	at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
> 	at org.apache.spark.scheduler.Task.run(Task.scala:85)
> 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> 	at java.lang.Thread.run(Thread.java:745)
> Driver stacktrace:
>   at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
>   at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
>   at scala.Option.foreach(Option.scala:257)
>   at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
>   at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
>   at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
>   at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:347)
>   at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:39)
>   at org.apache.spark.sql.Dataset$$anonfun$org$apache$spark$sql$Dataset$$execute$1$1.apply(Dataset.scala:2183)
>   at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
>   at org.apache.spark.sql.Dataset.withNewExecutionId(Dataset.scala:2532)
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$execute$1(Dataset.scala:2182)
>   at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collect(Dataset.scala:2189)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1925)
>   at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:1924)
>   at org.apache.spark.sql.Dataset.withTypedCallback(Dataset.scala:2562)
>   at org.apache.spark.sql.Dataset.head(Dataset.scala:1924)
>   at org.apache.spark.sql.Dataset.take(Dataset.scala:2139)
>   at org.apache.spark.sql.Dataset.showString(Dataset.scala:239)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:526)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:486)
>   at org.apache.spark.sql.Dataset.show(Dataset.scala:495)
>   at udfTest$.main(udfTest.scala:44)
>   ... 29 elided
> Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy
to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq
in instance of org.apache.spark.rdd.MapPartitionsRDD
>   at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
>   at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2006)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
>   at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
>   at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
>   at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
>   at org.apache.spark.scheduler.Task.run(Task.scala:85)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274)
>   at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>   at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>   at java.lang.Thread.run(Thread.java:745)
> scala> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message