spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas Graves (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-11701) YARN - dynamic allocation and speculation active task accounting wrong
Date Thu, 12 Nov 2015 17:34:11 GMT

    [ https://issues.apache.org/jira/browse/SPARK-11701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15002505#comment-15002505
] 

Thomas Graves commented on SPARK-11701:
---------------------------------------

I'm not exactly sure if this is same issue but trying this on a version of 1.6 (not latest)
after running a wordcount job it  we get a bunch of errors and it shuts down the SparkContext..

15/11/12 17:30:49 ERROR TransportChannelHandler: Connection to gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217
has been quiet for 120000 ms while there are outstanding requests. Assuming connection is
dead; please adjust spark.network.timeout if this is wrong.
15/11/12 17:30:49 ERROR TransportResponseHandler: Still have 15 requests outstanding when
connection from gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217 is closed
15/11/12 17:30:49 WARN YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to get executor
loss reason for executor id 2 at RPC address gsbl536n11.blue.ygrid.yahoo.com:47496, but got
no response. Marking as slave lost.
java.io.IOException: Connection from gsbl544n27.blue.ygrid.yahoo.com/10.213.42.242:33217 closed
        at org.apache.spark.network.client.TransportResponseHandler.channelUnregistered(TransportResponseHandler.java:104)
        at org.apache.spark.network.server.TransportChannelHandler.channelUnregistered(TransportChannelHandler.java:91)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at io.netty.channel.ChannelInboundHandlerAdapter.channelUnregistered(ChannelInboundHandlerAdapter.java:53)
        at io.netty.channel.AbstractChannelHandlerContext.invokeChannelUnregistered(AbstractChannelHandlerContext.java:158)
        at io.netty.channel.AbstractChannelHandlerContext.fireChannelUnregistered(AbstractChannelHandlerContext.java:144)
        at io.netty.channel.DefaultChannelPipeline.fireChannelUnregistered(DefaultChannelPipeline.java:739)
        at io.netty.channel.AbstractChannel$AbstractUnsafe$8.run(AbstractChannel.java:659)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)


> YARN - dynamic allocation and speculation active task accounting wrong
> ----------------------------------------------------------------------
>
>                 Key: SPARK-11701
>                 URL: https://issues.apache.org/jira/browse/SPARK-11701
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.5.1
>            Reporter: Thomas Graves
>            Priority: Critical
>
> I am using dynamic container allocation and speculation and am seeing issues with the
active task accounting.  The Executor UI still shows active tasks on the an executor but the
job/stage is all completed.  I think its also affecting the dynamic allocation being able
to release containers because it thinks there are still tasks.
> Its easily reproduce by using spark-shell, turn on dynamic allocation, then run just
a wordcount on decent sized file and set the speculation parameters low: 
>  spark.dynamicAllocation.enabled true
>  spark.shuffle.service.enabled true
>  spark.dynamicAllocation.maxExecutors 10
>  spark.dynamicAllocation.minExecutors 2
>  spark.dynamicAllocation.initialExecutors 10
>  spark.dynamicAllocation.executorIdleTimeout 40s
> $SPARK_HOME/bin/spark-shell --conf spark.speculation=true --conf spark.speculation.multiplier=0.2
--conf spark.speculation.quantile=0.1 --master yarn --deploy-mode client  --executor-memory
4g --driver-memory 4g



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message