spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Marcelo Vanzin (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (SPARK-16711) YarnShuffleService doesn't re-init properly on YARN rolling upgrade
Date Fri, 02 Sep 2016 17:43:20 GMT

     [ https://issues.apache.org/jira/browse/SPARK-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Marcelo Vanzin resolved SPARK-16711.
------------------------------------
       Resolution: Fixed
         Assignee: Thomas Graves
    Fix Version/s: 2.1.0

> YarnShuffleService doesn't re-init properly on YARN rolling upgrade
> -------------------------------------------------------------------
>
>                 Key: SPARK-16711
>                 URL: https://issues.apache.org/jira/browse/SPARK-16711
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, YARN
>    Affects Versions: 1.5.2
>            Reporter: Thomas Graves
>            Assignee: Thomas Graves
>             Fix For: 2.1.0
>
>
> When a yarn rolling upgrade happens the Spark YarnShuffleService isn't re-initializing
the tokens soon enough which causes running applications to fail with NullPointerExceptions
rather then IOExceptions which causes clients to not retry which in turn causes the application
to totally fail when it should have just retried and succeeded.
> 2016-07-22 23:22:05,460 [shuffle-server-1] ERROR server.TransportRequestHandler: Error
while invoking RpcHandler#receive() on RPC id 6235606084052282795
> java.lang.NullPointerException: Password cannot be null if SASL is enabled
>         at org.spark-project.guava.base.Preconditions.checkNotNull(Preconditions.java:208)
>         at org.apache.spark.network.sasl.SparkSaslServer.encodePassword(SparkSaslServer.java:196)
>         at org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
>         at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
>         at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
>         at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
>         at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:101)
>         at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
>         at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
>         at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
>         at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
>         at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
>         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>      at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
>         at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message