spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Damian Momot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-19293) Spark 2.1.x unstable with spark.speculation=true
Date Fri, 26 May 2017 08:23:04 GMT

    [ https://issues.apache.org/jira/browse/SPARK-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025974#comment-16025974
] 

Damian Momot commented on SPARK-19293:
--------------------------------------

Some additional errors for speculative tasks

{code}
java.lang.reflect.UndeclaredThrowableException
	at com.sun.proxy.$Proxy18.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2102)
	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1214)
	at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1210)
	at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1210)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
	at org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
	at org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:145)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:399)
	at org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat$$anonfun$buildReader$1.apply(ParquetFileFormat.scala:355)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:168)
	at org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:109)
	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown
Source)
	at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
	at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
	at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:126)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
	at org.apache.spark.scheduler.Task.run(Task.scala:99)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.InterruptedException: sleep interrupted
	at java.lang.Thread.sleep(Native Method)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:159)
	... 28 more
{code}

{code}
java.lang.IllegalArgumentException: requirement failed: cannot write to a closed ByteBufferOutputStream
	at scala.Predef$.require(Predef.scala:224)
	at org.apache.spark.util.ByteBufferOutputStream.write(ByteBufferOutputStream.scala:40)
	at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
	at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
	at java.io.ObjectOutputStream.writeFatalException(ObjectOutputStream.java:1580)
	at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:351)
	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
	at org.apache.spark.rpc.netty.NettyRpcEnv.serialize(NettyRpcEnv.scala:253)
	at org.apache.spark.rpc.netty.NettyRpcEnv.send(NettyRpcEnv.scala:192)
	at org.apache.spark.rpc.netty.NettyRpcEndpointRef.send(NettyRpcEnv.scala:512)
	at org.apache.spark.executor.CoarseGrainedExecutorBackend.statusUpdate(CoarseGrainedExecutorBackend.scala:142)
	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{code}

> Spark 2.1.x unstable with spark.speculation=true
> ------------------------------------------------
>
>                 Key: SPARK-19293
>                 URL: https://issues.apache.org/jira/browse/SPARK-19293
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.1.0, 2.1.1
>            Reporter: Damian Momot
>            Priority: Critical
>
> After upgrading from Spark 2.0.2 to 2.1.0 we've observed that jobs are often failing
when speculative mode is enabled.
> In 2.0.2 speculative tasks were simply skipped if they were not used for result (i.e.
other instance finished earlier) - and it was clearly visible in UI that those tasks were
not counted as failures.
> In 2.1.0 many tasks are being marked failed/killed when speculative tasks start to run
(that is at the end of stage when there are spare executors to use) which also leads to entire
stage/job failures.
> Disabling spark.speculation solves failing problem - but speculative mode is very useful
especially when different executors run on machines with varying load (for example in YARN)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message